United States Patent5155809
Baker , ; et al.October 13, 1992

Title

Uncoupling a central processing unit from its associated hardware for interaction with data handling apparatus alien to the operating system controlling said unit and hardware

Abstract

The functions of two virtual operating systems (e.g., S/370 VM, VSE or IX370 and S/88 OS) are merged into one physical system. Partner pairs of S/88 processors run the S/88 OS and handle the fault tolerant and single system image aspects of the system. One or more partner pairs of S/370 processors are coupled to corresponding S/88 processors directly and through the S/88 bus. Each S/370 processor is allocated from 1 to 16 megabytes of contiguous storage from the S/88 main storage. Each S/370 virtual operating system thinks its memory allocation starts at address 0, and it manages its memory through normal S/370 dynamic memory allocation and paging techniques. The S/370 is limit checked to prevent the S/370 from accessing S/88 memory space. The S/88 Operating System is the master over all system hardware and I/O devices. The S/88 processors access the S/370 address space in direct response to a S/88 application program so that the S/88 may move I/O data into the S/370 I/O buffers and process the S/370 I/O operations. The S/88 and S/370 peer processor pairs execute their respective Operating Systems in a single system environment without significant rewriting of either operating system. Neither operating system is aware of the other operating system nor the other processor pairs.


Inventors:Baker; Ernest D. (Boca Raton, FL), Dinwiddie, Jr.; John M.  (West Palm Beach, FL), Grice; Lonnie E.  (Boca Raton, FL), Joyce; James M.  (Boca Raton, FL), Loffredo; John M.  (Deerfield Beach, FL), Sanderson; Kenneth R.  (West Palm Beach, FL)
Assignee:International Business Machines Corp. (Armonk, NY)
Appl. No.:353114
Filed:May 17, 1989

Current U.S. Class:709/227 709/201 
Field of Search:364/228,228.8,232.3

U.S. Patent Documents
4004277January 1977Gavril
4099234July 1978Woods et al.
4214305July 1980Tokita et al.
4228496October 1980Katzman et al.
4244019January 1981Anderson et al.
4245344January 1981Richter
4315321February 1982Parks et al.
4316244February 1982Grondalski
4325116April 1982Kranz et al.
4354225October 1982Frieder et al.
4356550October 1982Katzman et al.
4365295December 1982Katzman et al.
4368514January 1983Persaud
4400775August 1983Nozaki et al.
4412281October 1983Works
4414620November 1983Tsuchimoto et al.
4418382November 1983Larson et al.
4453215June 1984Reid
4456954June 1984Bullions, III et al.
4486826December 1984Wolff et al.
4533996August 1985Hartung et al.
4563737January 1986Nakamura et al.
4564903January 1986Guyette et al.
4591975May 1986Wade et al.
4597084June 1986Pynneson et al.
4628508December 1986Sager et al.
4653112March 1987Ouimette
4654779March 1987Kato et al.
4654857March 1987Samson et al.
4674038June 1987Brelsford et al.
4677546June 1987Freeman et al.
4679166July 1987Berger et al.
4722048January 1988Hirsch et al.
4727480February 1988Albright et al.
4747040May 1988Blanset et al.
4750177June 1988Hendrie et al.
4812975March 1989Adachi et al.
4816990March 1989Williams
4855936August 1989Casey et al.
4920481April 1990Blinkley et al.
4994963February 1991Rorden et al.
Other References
Inselberg, Multiprocessor architecture ensures fault-tolerant transaction processing, Mini-Micro Systems, Apr. 1983. .
IBM Systems Journal, vol. 27, No. 2, 1988 p. 93. .
Selwyn, Parallel Processing and Expert Systems, pp. 311-314. .
Weiser et al., Status and Performance of the Z mob Parallel Processing System, Feb. 25-28, Spring Comp Con 85 IEEE pp. 71-74. .
Ramadrandran et al., Hardware Support for Interprocess Communication, Jun. 2-5, 1987, 14th International Symposium Computer Architecture, IEEE. .
Peacock, Application dictates your choice of a multiprocessor model, EDN Jun. 25, 1987, pp. 241-246, 248. .
Golkar et al., IBM-Compatible Mainframe in 20,000-Gate CMOS Arrays, VLSI Systems Design, May 20, 1987..~
Primary Examiner: Lall; Parshotam S.
Assistant Examiner: Cosimano; E.
Attorney, Agent or Firm:Black; John C. Brown, Jr.; Winfield J.

Claims


What is claimed is:
1. In a data processing system, a combination comprising
a processor unit and associated hardware for processing information under control of an operating system,
information handling apparatus alien to the operating system, and
means uncoupling the processor unit from said hardware and coupling the processor unit to said apparatus for interaction with said apparatus.

2. The system set forth in claim 1 wherein said means performs said uncoupling and coupling functions without the use of operating system services and without rejection by said operating system.

3. In a data processing system, a combination comprising
a processor unit and associated hardware for processing information under control of a virtual operating system,
information handling apparatus alien to the operating system, and
means including an application program running on said processor unit uncoupling the processor unit from said hardware and coupling the processor unit to said apparatus for interaction between said processor unit and said apparatus in accordance with said application program and without knowledge thereof by the operating system.

4. The system set forth in claim 3 wherein said means further comprises,
said processor unit effective during the execution of selected instructions of said application program for applying one of a plurality of preselected virtual addresses on its processor address bus, and
address decode logic means responsive to one of said preselected addresses on said bus for uncoupling the processor unit from said hardware and coupling the processor unit to the information handling apparatus for execution of said selected instructions in said processor unit and apparatus.

5. The system set forth in claim 4
wherein said processor unit includes an address strobe line coupled to said hardware, and
wherein said address decode logic means responds to the presence of one of said preselected addresses on said bus to block a processor signal on said address strobe line from said hardware and to couple said signal to the information handling apparatus for rendering the apparatus effective during the execution of one of said selected instructions.

6. In a data processing system, a combination comprising a first processing unit and hardware including a main storage and a plurality of I/O devices for executing programs under control of a first virtual operating system,
a second processing unit, alien to the first operating system for executing programs under control of a second operating system,
means including an application program running on the first processing unit uncoupling the first processing unit from said hardware and coupling the first processing unit to the second processing unit, and
means controlled by the first processing unit and the application program while that unit is uncoupled from said hardware for transferring data between the first and second processing units.

7. The system of claim 6 wherein said transfer of data between the first and second processing units is indiscernible to the first operating system.

8. In a data processing system,
a first processing unit and hardware including a main storage and a plurality of I/O devices for executing programs having a first instruction architecture under control of a first virtual operating system,
a second processing unit alien to said operating system, for executing programs having a different instruction architecture under control of a second virtual operating system differing from the first operating system,
means including an application program in the first processing unit uncoupling the first processing unit from said hardware and coupling the first processing unit directly to the second processing unit, and
means controlled by the first processing unit and the application program while that unit is uncoupled from said hardware for exchanging at least one of command and data information between the second unit and the first unit.

9. The system of claim 8 further comprising
means controlled by the first processing unit and the application program for converting command and data information transferred from the second unit to the first unit to commands executable by and data useable by the first processing unit.

10. The system of claim 9 wherein said first processing unit and said hardware are operated to process said converted command and data information under control of said first operating system.

11. In a data processing system,
a first processing unit and associated hardware for executing programs under control of a first virtual operating system, said processing unit having a processor address bus,
a second processing unit alien to said operating system, for executing programs under control of a second virtual operating system,
means directly coupling the second processing unit to the first processing unit for the transfer of information therebetween,
means including an application program running on the first processing unit for applying selected virtual addresses to said address bus,
logic means responsive to one of said selected virtual addresses on said bus for uncoupling the first processing unit from its associated hardware, and
means controlled by the first processing unit and the application program while the first processing unit is uncoupled for passing at least one of command and data information between the second unit and the first unit via said direct coupling means.

12. The system of claim 11 wherein said first processing unit includes a processor data bus and a processor control bus, and wherein said direct coupling means directly couples the second processing unit to the processor address, data and control buses of the first processing unit.

13. The system of claim 11 further comprising
means controlled by the first processing unit and the application program for converting command and data information transferred from the second processing unit to the first processing unit to commands executable by and data useable by the first processing unit.

14. The system of claim 13 where said first processing unit and hardware are operated to process said converted commands and data under control of said first operating system.

15. In a data processing system,
a first processing unit and associated hardware for executing programs under control of a first virtual operating system,
a second processing unit and associated hardware for executing programs under control of a second virtual operating system;
each processing unit and its associated hardware being alien to the operating system of the other processing unit;
means directly coupling the processing units to each other for the transfer of information therebetween;
means associated with each processing unit and including a respective application program running on that processing unit uncoupling that processing unit from its associated hardware and coupling that processing unit to the direct coupling means; and
means controlled by each processing unit and its respective application program while that unit is uncoupled, for exchanging information between that unit and the direct coupling means, whereby information may be exchanged directly between the processing units.

16. The system of claim 15 further comprising,
means associated with each one of the processing units for converting commands and data received from the other processing unit to commands executable by and data useable by the one processing unit.

17. The system of claim 15 wherein each said uncoupling means further includes
processing unit means responsive to an instruction of its respective application program for applying a unique virtual address on an address bus of that processing unit, and
logic means responsive to said unique virtual address for uncoupling its respective processing unit from its respective hardware.

18. The system of claim 15 wherein each direct coupling means includes a local storage accessible by each of the processing units while uncoupled for processing unit intercommunication.

19. The system of claim 18 wherein each processing unit and its associated hardware are operated to process its received and converted commands and data under control of its respective operating system.

20. In a data processing system,
a first processor unit and first associated hardware including main storage and I/O apparatus for processing information under control of a first virtual operating system,
a second processor unit and second associated hardware including main storage and I/O apparatus for processing information under control of a second virtual operating system,
each processor unit and its associated hardware being alien to the operating system of the other processor unit and hardware,
means directly coupling said processor units, and
means associated with the processor units and including respective application programs running on the processor units for uncoupling the processor units from their respective hardware and coupling the processor units to the direct coupling means for the transfer of commands and data between the processor units.

21. In a data processing system having a processor unit and hardware for processing information under control of an operating system, a method for permitting interaction between the processor unit and data handling apparatus alien to the operating system in a manner which is indiscernible by the operating system comprising the steps of
uncoupling the processor unit from said hardware during instruction execution; and
while the processor unit and hardware are uncoupled, coupling the processor unit to said apparatus for interaction therewith in accordance with said instruction execution.

22. In a data processing system of the type in which a processor unit interacts with hardware for processing information under control of an operating system, a method for transferring data between the processor unit and data handling apparatus alien to the operating system comprising the steps of
executing a selected data transfer instruction of a special application program in said processor unit;
during the execution of said instruction, uncoupling the processor unit from said hardware and coupling the processor unit to the data handling apparatus for transferring data between the processor unit and the apparatus in accordance with said instruction, whereby the data transfer is indiscernible by the operating system.

23. A method for uncoupling a processing unit from its associated hardware to permit an information transfer, in a manner indiscernible by its operating system, with apparatus alien to its operating system, comprising the steps of
executing in said processing unit a selected data transfer instruction of a special application program,
applying a predetermined virtual address on the address bus of the processing unit during the execution of said instruction.
decoding said address,
responsive to said decoding, blocking a processing unit address strobe signal from the associated hardware and applying it to the alien apparatus, thereby uncoupling the processing unit from its associated hardware and coupling the processing unit to the alien apparatus for data transfer.

24. The method of claim 23 further comprising the step of
terminating the instruction execution upon completion of the data transfer.

25. A method for uncoupling a processing unit from its associated hardware to permit an information transfer, in a manner indiscernible by its operating system, with apparatus alien to its operating system, comprising the steps of
executing in said processing unit a selected data transfer instruction of a special application program,
placing a predetermined virtual address on the address bus of the processing unit during the execution of said instruction,
decoding said address,
uncoupling the processing unit from its associated hardware and coupling the processing unit to the alien apparatus in response to said decoding for execution of said data transfer instruction with said alien apparatus.

26. The method of claim 25 further comprising the step of,
terminating the instruction execution upon completion of the data transfer.

27. In a data processing system, a combination comprising
a processor unit and associated hardware for processing information under control of an operating system,
information handling apparatus alien to the operating system, and
means uncoupling the processor unit from said hardware and coupling the processor unit to said apparatus to permit operation of said unit in isolation from said hardware and operating system while interacting with said apparatus.

28. The system set forth in claim 27 wherein said means performs said uncoupling and coupling functions in a manner indiscernable by said operating system.

29. A method for coupling a first processor operating under the control of one operating system to one or more I/O units without software drivers in said operating system for said units comprising the steps of
coupling the first processor for direct information transfer with a data processing system, including said I/O units and a second processor operated under the control of a second operating system,
selectively uncoupling the second processor from the data processing system in a manner indiscernable to the second operating system by means including an application program executing on the second processor,
selectively passing I/O commands and data from the first processor and its operating system to the second processor while uncoupled,
converting said commands and data to commands executable by and data usable by the second processor and its operating system, and
processing the converted commands and data in the data processing system.

30. The method of claim 29 further comprising the step of
selectively passing data directly from the second processor to said first processor while the second processor is uncoupled.

31. A method of coupling one processor of one architecture operating under one operating system to another processor of a different architecture operating under a different operating system, said other processor normally coupled to hardware including a main storage and a plurality of I/O devices, comprising the steps of
uncoupling the other processor from its hardware by means including an application program running on said other processor,
operating the other processor as an I/O controller for the one processor by
a. passing I/O commands and data from the one processor to the other processor in its uncoupled state,
b. converting said commands and data to commands executable by and data useable by the other processor, and
c. executing said converted commands in the other processor in its coupled state.

32. A method of transferring information from one processor operating under the control of one operating system to a second processor coupled to and operating with hardware including a main storage and a plurality of I/O devices under the control of a second operating system differing from the one operating system comprising the steps of
coupling one processor to the second processor via a direct path means,
uncoupling the second processor from the hardware and coupling the second processor to the direct path means under control of the second processor and an application program in a manner indiscernable to the second operating system,
while said second processor is uncoupled, transferring information between said one processor and said second processor via said direct path means under the control of said second processor and an application program running thereon, whereby the transfer is indiscernable to the second operating system.

33. In a data processing system,
a first processor having a bus structure and interacting with associated hardware for processing information under control of an operating system,
a second processor alien to the operating system and having a bus structure and means for initiating interrupt requests on its processor bus for the first processor for the transfer of commands and data thereto,
a direct coupling mechanism between the processors having a local storage for storing commands and data being transferred between the processors, a local bus structure adapted to couple the local storage to the processor bus structures of each processor for command and data transfers therebetween and logic means including a direct memory access controller (DMAC) for controlling the transfer of commands and data between said processor bus structures and the local storage via the local bus structure,
logic means including the DMAC responsive to one of the interrupt requests from the second processor for applying an interrupt signal to the first processor,
means including the first processor responsive to said signal for operating the first processor in isolation from its associated hardware and operating system for accessing an application program having a routine for servicing the interrupt request, and
means operating the first processor in isolation from its associated hardware and operating system under the control of said application program for transferring commands and data between the processors via said direct coupling mechanism pursuant to said interrupt request.

34. In a data processing system,
a processor and associated hardware for processing information under control of an operating system, said operating system having routines for handling interrupt requests from said processing system on plural priority levels,
data handling apparatus alien to the operating system and including means for initiating additional interrupt requests to the processor on one of said priority levels for data transfer with the processor, said routines being unable to service the interrupt requests from said apparatus;
an application program in the processing system including an additional interrupt handler routine to service the interrupt requests from said apparatus;
means effective upon the initiation of an interrupt request from said apparatus for directing the processor to the additional interrupt handler routine without rejection by said operating system, said system thereafter effective to execute said additional interrupt handler routine on said processor; and
means including said application program running on said processor unit selectively uncoupling the processor unit from said hardware and coupling the processor unit to said apparatus for data transfer without rejection by the operating system.

35. In a data processing system,
a first processing unit and associated hardware for executing programs under control of a first virtual operating system,
a second processing unit and associated hardware for executing programs under control of a second virtual operating system;
each processing unit and its associated hardware being alien to the operating system of the other processing unit;
means directly coupling the processing units to each other for the transfer of information therebetween;
means in each processing unit for initiating interrupt requests to the other processing unit when a data transfer is required between the processors,
means in each processing unit including an application program with interrupt handler routines to service interrupt requests from the other processing unit;
means associated with each processing unit and including a respective application program running on that processing unit uncoupling that processing unit from its associated hardware and coupling that processing unit to the direct coupling means; and
means controlled by each processing unit and its respective application program while that unit is uncoupled, for exchanging information between that unit and the direct coupling means, whereby information may be exchanged directly between the processing units.

36. In a data processing system, a combination comprising
a processor unit and associated hardware for processing information under control of an operating system,
information handling apparatus alien to the operating system, and
means uncoupling the processor unit from said hardware and coupling the processor unit to said apparatus to permit operation of said unit in isolation from said hardware and operating system while interacting with said apparatus, whereby said means performs said uncoupling and coupling functions without using the services of said operating system and without rejection by said operating system.

37. In a data processing system,
a first processor unit and first associated hardware for processing information under control of a first virtual operating system,
a second processor unit and second associated hardware for processing information under control of a second virtual operating system,
each processor unit and its associated hardware being alien to the operating system of the other processor unit and hardware,
means directly coupling said processor units, and
means associated with each processor unit and including a respective application program running on that processor unit for uncoupling that processor unit from its respective hardware and coupling that processor unit to the direct coupling means for the transfer of data between the processor units without using the services of and without rejection by either operating system.

Description

The subject application is related to other applications having different joint inventorships filed on the same day and assigned to a common assignee. These other applications are:

______________________________________ Serial No. Title Inventors ______________________________________ 07/353116 Fault Tolerant Data Processor E. D. Baker System J. M. Dinwiddie L. E. Grice J. M. Joyce J. M. Loffredo K. R. Sanderson G. A. Suarez 07/353117 Servicing Interrupt Requests J. M. Dinwiddie In A Data Processing System L. E. Grice Without Using The Services J. M. Loffredo Of An Operating System K. R. Sanderson 07/353113 A Single Physical Main E. D. Baker Storage Shared By Two Or J. M. Dinwiddie More Processors Executing L. E. Grice Respective Operating Systems J. M. Loffredo K. R. Sanderson G. A. Suarez 07/353111 Providing Additional System E. D. Baker Characteristics To A Data J. M. Dinwiddie Processing System L. E. Grice J. M. Joyce J. M. Loffredo K. R. Sanderson 07/353115 Method And Apparatus For The E. D. Baker Direct Transfer Of Informa- J. M. Dinwiddie tion Between Application L. E. Grice Programs Running On Distinct J. M. Joyce Processors Without Utilizing J. M. Loffredo The Services Of One Or Both K. R. Sanderson Operating Systems 07/353112 Data Processing System With J. M. Dinwiddie System Resource Management B. J. Freeman For Itself And For An Associ- L. E. Grice ated Alien Processor J. M. Loffredo K. R. Sanderson G. A. Suarez ______________________________________

TABLE OF CONTENTS

Background of the Invention

Field of the Invention

Prior Art

Summary of the Invention

Brief Description of the Drawings

Description of the Preferred Embodiment

Introduction

1. Operating a Normally Non-Fault Tolerant Processor in a Fault Tolerant Environment

2. Uncoupling a Processor from Its Associated Hardware to Present Commands and Data from Another Processor to Itself

3. Presentation of Interrupts to a System Transparent to the Operating System

4. Sharing a Real Storage Between Two or More Processors Executing Different Virtual Storage Operating Systems

5. Single System Image

6. Summary

Prior Art System/88 Detail

Fault Tolerant S/370 Module 9 Interconnected via Links,

General Description of Duplexed Processor Partner Units 21, 23 Coupling of S/370 and S/88 Processor Elements 85, 62 Processor to Processor Interface 89

1. I/O Adapter 154 (Note: Uses FIG. 18 re IOA)

2. I/O Adapter Channel 0 and Channel 1 Bus

3. The Bus Control Unit 156--General Description

4. Direct Memory Access Controller 209

5. Bus Control Unit 156--Detailed Description

(a) Interface Registers for High Speed Data Transfer

(b) BCU Uncouple and Interrupt Logic 215, 216

(c) BCU Address Mapping

(d) Local Address and Data Bus Operations

(e) S/88 Processor 62 and DMAC209 Addressing To/From Local Storage 210

(f) BCU Basic Storage Module (BSM) RD/WR Byte Counter Operation

(g) Handshake Sequences BCU 156/Adapter 154

S/370 Processor Element 85

Processor Bus 170 and Processor Bus Commands

S/370 Storage Management Unit 81

1. Cache Controller 153

2. STCI 155

(a) Introduction

(b) System Bus Phases

(c) STCI Features

(d) Data Store Operations

(e) Data Fetch Operations

S/370 I/O Support

S/370 I/O Operations, Firmware Overview

System Microcode Design

1. Introduction

2. ETIO/EXEC370 Program Interface

3. EXEC370, S/370 Microcode Protocol

4. Instruction Flows Between S/370 Microcode and EXEC370

Operation of the Bus Control Unit (BCU) 156

1. Introduction

2. S/370 Start I/O Sequence Flow, General and Detailed Description

3. S/370 I/O Data Transfer Sequence Flow, General Description

(a) I/O Write Operations:

(b) I/O Read Operation:

(c) S/370 High Priority Message Transfer Sequence Flow

(d) BCU Status Command

(e) Programmed BCU Reset

Count, Key, and Data Track Format Emulation

1. The Object System

2. The Target System

3. The Emulation Format

4. Emulation Functions

Sharing of Real Storage 16 by S/88 and S/370

1. Introduction

2. Mapping S/88 Storage 16

3. Startup Procedure

4. Start S/370 Service Routine

5. Unthread Chosen String of MMC's From Free List

6. Writing Storage Base and Size to STCI

Initialization Functions for Uncoupling S/88 Interrupts Initiated by S/370

Gain Freedom Without Modifying the S/88 Operating System

Stealing Storage Without Modifying S/88 OS

Power on and Synchronization of Simplexed and Partner Units 21, 23

(S/88 Processing Unit as a Service Processor for S/370 Processing Unit)

1. Introduction

2. Fault Tolerant Hardware Synchronization

3. A Simplexed Processing Unit 21 is Powered On

(a) Hardware Implementation

(b) Microcode--Only Implementation

4. Duplexed Processing Units 21, 23 are Powered On

(a) Hardware Implementation

(b) Microcode--Only Implementation

5. A Partner 23 Is Inserted While The Other Unit 21 Processes Normally

(a) Hardware Implementation

(b) Microcode--Only Implementation

6. A Partner Detects A Compare Failure

(a) Hardware Implementation

(b) Microcode--Only Implementation

Alternative Embodiments

1. Use In Other (Non-S/88) Fault-Tolerant Systems

2. Direct Data Transfers Between S/88 I/O Controllers and S/370 Main Storage

3. Uncoupling Both Processors of a Directly Connected Pair

BACKGROUND OF THE INVENTION

The improvement of the present application relates to an improved method and means for permitting a central processing unit (CPU) of a data processing system to interact with apparatus which is alien to the operating system under which the data processing system is operating. That is, the operating system has no configuration data concerning the alien apparatus in its start up configuration tables; yet the improvement permits the CPU to control the apparatus and/or transfer data and commands with the apparatus.

Prior Art Data processing systems include a system configuration which specifies the devices and programs that form a particular processing system. The configuration file contains commands that the operating system executes as part of the procedure for starting up a system.

Data processing systems typically include start up code which initializes all configured devices. The procedure includes reading tables that specify configurations of boards, disks and other devices connected to the system. If a device is not so configured, it typically cannot be coupled to for data transfer with the processor of the system because the operating system is unable to control the data transfer, i.e., the device is "alien" to the operating system. O. S. would reject transfers and remove spurious device signal sources.

SUMMARY OF THE INVENTION

The present improvement permits the CPU to be coupled to and interact with such alien apparatus by providing a method and means for isolating or uncoupling the CPU from its associated system hardware and the operating system and coupling the CPU to the alien apparatus for interaction therewith.

In a preferred embodiment, a specific application program executing an instruction on the CPU places one of a group of predetermined virtual addresses on the CPU virtual address bus, logic decodes the address and blocks a CPU address strobe signal from being transmitted to associated other system hardware and instead applies the address strobe signal to the alien apparatus. This action permits the CPU to complete the execution of the function, defined by the instruction, with the alien apparatus rather than with the system hardware. For example, the instruction can define a command and/or data transfer between the CPU and the alien apparatus.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 diagrammatically illustrates the standard interconnection computer systems utilizing a communication line;

FIG. 2 shows diagrammatically the prior art interconnection of S/88 processors in a fault tolerant environment;

FIG. 3 shows diagrammatically the interconnection of S/370 processors with S/88 processors in the preferred embodiment;

FIG. 4 shows diagrammatically a S/370 system coupled to a S/88 system in the manner of the preferred embodiment;

FIG. 5 shows diagrammatically the uncoupling of a S/88 processor to provide data exchange between the S/370 and the S/88 of the preferred embodiment;

FIGS. 6A, 6B and 6C diagrammatically illustrate the prior art IBM System/88 module, plural modules interconnected by high speed data interconnections (HSDIs) and plural modules interconnected via a network in a fault tolerant environment with a single system image;

FIG. 7 diagrammatically illustrates one form of the improved module of the present invention which provides S/370 processors executing S/370 application programs under control of a S/370 operating system which are rendered fault tolerant by virtue of the manner in which the processors are connected to each other and to S/88 processors, I/O and main storage;

FIG. 8 diagrammatically illustrates in more detail the interconnection of paired S/370 units and S/88 units with each other to form a processor unit and their connection to an identical partner processor unit for fault tolerant operation;

FIGS. 9A and 9B illustrate one form of physical packaging of paired S/370 and S/88 units on two boards for insertion into the back panel of a processing system enclosure;

FIG. 10 conceptually illustrates S/88 main storage and sections of that storage dedicated to S/370 processor units without knowledge by the S/88 operating system;

FIG. 11 shows diagrammatically certain components of the preferred form of a S/370 processor and means connecting it to a S/88 processor and storage;

FIG. 12 shows the components of FIG. 11 in more detail and various components of a preferred form of a S/88 processor;

FIG. 13 diagrammatically illustrates the S/370 bus adapter;

FIGS. 14A, 14B and 15A-C illustrate conceptually the timing and movement of data across the output channels of the S/370 bus adapter;

FIG. 16 diagrammatically illustrates the direct interconnection between a S/370 and a S/88 processor in more detail;

FIG. 17 conceptually illustrates data flow between a S/370 bus adapter and a DMA controller of the interconnection of FIG. 16;

FIG. 18 shows DMAC registers for one of its four channels;

FIGS. 19A, 19B and 19C (with layout FIG. 19) are a schematic/diagrammatic illustration showing in more detail than FIG. 16 a preferred form of the bus control unit interconnecting a S/370 processor with a S/88 processor and main storage;

FIG. 20 is a schematic diagram of a preferred form of the logic uncoupling the S/88 processor from its associated system hardware and of the logic for handling interrupt requests from the alien S/370 processor to the S/88 processor;

FIG. 21 conceptually illustrates the modification of the existing S/88 interrupt structure for a module having a plurality of interconnected S/370 - S/88 processors according the teachings of the present application;

FIGS. 22, 23 and 24 are timing diagrams for Read, Write and Interrupt Acknowledge cycles of the preferred form of the S/88 processors;

FIGS. 25 and 26 show handshake timing diagrams for adapter bus channels 0, 1 during mailbox read commands, Q select up commands, BSM read commands and BSM write commands;

FIG. 27 is a block diagram of a preferred form of a S/370 central processing element;

FIGS. 28 and 29 illustrate certain areas of the S/370 main storage and control storage;

FIG. 30 shows a preferred form of the interface buses between the S/370 central processing element, I/O adapter, cache controller, storage control interface and S/88 system bus, and processor;

FIG. 31 is a block diagram of a preferred form of a S/370 cache controller;

FIGS. 32A and 32B (with layout FIG. 32) schematically illustrate a preferred form of the storage control interface in greater detail;

FIG. 33 is a timing diagram illustrating the S/88 system bus phases for data transfer between units on the bus;

FIG. 34 is a fragmentary schematic diagram showing the "data in" registers of a paired storage control interface;

FIG. 35 shows formats of the command and store data words stored in the FIFO of FIG. 32B;

FIG. 36A-D illustrate store and fetch commands from the S/370 processor and adapter which are executed in the storage control interface;

FIG. 37 illustrates conceptually the preferred embodiment of the overall system of the present application from a programmer's point of view;

FIGS. 38, 39 and 40 illustrate diagrammatically preferred forms of the microcode design for the S/370 and S/88 interface, the S/370 I/O command execution and the partitioning of the interface between EXEC 370 software and the S/370 I/O driver (i.e. ETIO+BCU+S/370 microcode) respectively;

FIGS. 41A and 41B illustrate conceptually interfaces and protocols between EXEC 370 software and S/370 microcode and between ETIO microcode and EXEC 370 software;

FIGS. 41C-H illustrate the contents of the BCU local store including data buffers, work queue buffers, queues, queue communication areas and hardware communication areas including a link list and the movement of work queue buffers through the queues, which elements comprise the protocol through which S/370 microcode and EXEC 370 software communicate with each other;

FIG. 42 illustrates conceptually the movement of work queue buffers through the link list and the queues in conjunction with the protocols between the EXEC 370, ETIO, S/370 microcode and the S/370 - S/88 coupling hardware;

FIG. 43 illustrates conceptually the execution of a typical S/370 Start I/O instruction;

FIGS. 44A-L illustrate diagrammatically the control/data flows for S/370 microcode and EXEC 370 as they communicate with each other for executing each type of S/370 I/O instruction;

FIGS. 45A-AG illustrate data, command and status information on the local address and data buses in the BCU during data transfer operations within the BCU;

FIGS. 46A-K illustrate conceptually a preferred form of disk emulation process whereby the S/88 (via the BCU, ETIO and EXEC 370) stores and fetches information on a S/88 disk in S/370 format in response to S/370 I/O instructions;

FIG. 47 illustrates conceptually the memory mapping of FIG. 10 together with a view of the S/88 storage map entries, certain of which are removed to accommodate one S/370 storage area;

FIGS. 48A-K illustrate a preferred form of virtual/physical storage management for the S/88 which can interact with newly provided subroutines during system start-up and reconfiguration routines to create S/370 storage areas within the S/88
physical storage;

FIGS. 49 and 50 are fragmentary diagrams illustrating certain of the logic used to synchronize S/370 - S/88 processor pairs and partner units; and

FIGS. 51 and 52 illustrate alternative embodiments of the present improvement.

DESCRIPTION OF THE PREFERRED EMBODIMENT

Introduction

The preferred embodiment for implementing the present invention comprises a fault tolerant system. Fault tolerant systems have typically been designed from the bottom up for fault tolerant operation. The processors, storage, I/O apparatus and operating systems have been specifically tailored to provide a fault tolerant environment. However, the breadth of their customer base, the maturity of their operating systems, the number and extent of the available user programs are not as great as those of the significantly older mainframe systems of several manufacturers such as the System/370 (S/370) system marketed by International Business Machines Corporation.

Today's fault tolerant data processing systems offer many advanced features that are not normally available on the older non-fault tolerant mainframe systems or that are not supported by the mainframe operating systems. Some of these features include: a single system image presented across a distributed computing network; the capability to hot plug processors and I/O controllers (remove and install cards with power on); instantaneous error detection, fault isolation and electrical removal from service of failed components without interruption to the computer user; customer replaceable units identified by remote service support; and dynamic reconfiguration resulting from component failure or adding additional devices to the system while the system is continuously operating. One example of such fault tolerant systems is the System/88 (S/88) system marketed by International Business Machines Corporation.

Proposals for incorporating the above features into the S/370 environment and architecture might typically consist of a major rewrite of the operating system(s) and user application programs and/or new hardware developed from scratch. However, the major rewrite of an operating system such as VM, VSE, IX370, etc. is considered by many to be a monumental task, requiring a large number of programmers and a considerable period of time. It usually takes more than five years for a complex operating system such as IBM S/370 VM or MVS to mature. Up to this time most system crashes are a result of operating system errors. Also, many years are required for users to develop proficiency in the use of an operating system. Unfortunately, once an operating system has matured and has developed a large user base, it is not a simple effort to modify the code to introduce new functions such as fault tolerance, dynamic reconfiguration, single system image, and the like.

Because of the complexities and expense of migrating a mature operating system into a new machine architecture, the designers will usually decide to develop a new operating system which may not be readily accepted by the using community. It may prove impractical to modify the mature operating system to incorporate the new features exemplified by the newly developed operating system; however, the new operating system may never develop a substantial user base, and will take many years of field usage before most problems are resolved.

Accordingly, it is intended that the present improvement will provide a fault tolerant environment and architecture for a normally non-fault-tolerant processing system and operating system without major rewrite of the operating system. In the preferred embodiment a model of IBM System/88 is coupled to a model of an IBM S/370.

One current method of coupling distinct processors and operating systems is through some kind of communications controller added to each system, appending device drivers to the operating systems, and using some kind of communication code such as Systems Network Architecture (SNA) or OSI to transport data. Normally, to accomplish data communications between end-node computers in a network, it is necessary that the end nodes each understand and apply a consistent set of services to data that is to be exchanged.

To reduce their design complexity, most networks are organized as a series of layers or levels, each one built upon its predecessor. The number of layers, the name of each layer, and the function of each layer differ from network to network. However, in all networks, the purpose of each layer is to offer certain services to the higher layers, shielding those layers from the details of how the offered services are actually implemented. Layer n on one machine carries on a conversation with layer n on another machine. The rules and conventions used in this conversation are collectively known as the layer n protocol. The entities comprising the corresponding layers on different machines are called peer processes, and it is the peer processes that are said to communicate using the protocol.

In reality, no data are directly transferred from layer n on one machine to layer n on another machine (except in the lowest or physical layer). That is, there can be no direct coupling of application programs operating on distinct or alien systems. Instead, each layer passes data and control information to the layer immediately below it, until the lowest layer is reached. At the lowest layer there is physical communication with the other machine, as opposed to the virtual communication used by the higher layers.

Definitions of these sets of services have existed in a number of different networks as mentioned above and more recently, interest has centered on provision of protocols to ease interconnection of systems from different vendors. A structure for development of these protocols is the framework defined by the International Standards Organization (ISO) seven layer OSI (Open Systems Interconnect) model. Each of the layers in this model is responsible for providing networking services to the layer above it while requesting services from the layer below it. The services provided at each layer are well defined so that they can be applied consistently by each station in the network. This is said to allow for the interconnection of different vendors' equipment. Implementation of layer to layer services within a node is implementation-specific and allows vendor differentiation on the basis of services provided within a station.

It is important to note that the entire purpose of implementing such a structured set of protocols is to perform end-to-end transfer of data. The major divisions within the OSI model can be better understood if one realizes that the user node is concerned with the delivery of data from the source application program to the recipient application program. To deliver this data, the OSI protocols act upon the data at each level to furnish frames to the network. The frames are built up as the data coupled with corresponding headers applied at each OSI level. These frames are then provided to the physical medium as a set of bits which are transmitted through the medium. They then undergo a reverse set of procedures to provide the data to the application program at the receiving station.

As stated earlier, one current method of coupling distinct processors and operating systems is through some kind of communications controller added to each system, appending device drivers to the operating systems, and using some kind of communication code such as Systems Network Architecture (SNA) or OSI to transport data. FIG. 1 shows a standard interconnection of two computer systems by means of a Local Area Network (LAN). In particular an IBM S/370 architecture system is shown connected to an IBM System/88 architecture. It will be observed that in each architecture an application program operates through an interface with the operating system to control a processor and access an I/O channel or bus. Each architecture device has a communications controller to exchange data. In order to communicate, a multi-layered protocol must be utilized to allow data to be exchanged between the corresponding application programs.

An alternative method to exchange data would be a coprocessor method in which the coprocessor resides on the system bus, arbitrates for the system bus, and uses the same I/O as the host processor. The disadvantage of the coprocessor method is the amount of code rewrite required to support non-native (alien) host I/O. Another disadvantage is that the user must be familiar with both systems architectures to switch back and forth from coprocessor to host operating systems--an unfriendly user environment.

A prior art fault tolerant computer system has a processor module containing a processing unit, a random access memory unit, peripheral control units, and a single bus structure which provides all information transfers between the several units of the module. The system bus structure within each processor module includes duplicate partner buses, and each functional unit within a processor module also has a duplicate partner unit. The bus structure provides operating power to units of a module and system timing signals from a main clock.

FIG. 2 shows in the form of a functional diagram the structure of the processor unit portion of a processor module. By using identical paired processors mounted on a common replacement card and executing identical operations in synchronization, comparisons can be made to detect processing errors. Each card normally has a redundant partnered unit of identical structure.

The computer system provides fault detection at the level of each functional unit within the entire processor module. Error detectors monitor hardware operations within each unit and check information transfers between units. The detection of an error causes the processor module to isolate the unit which caused the error and to prohibit it from transferring information to other units, and the module continues operation by employing the partner of the faulty unit.

Upon detection of a fault in any unit, that unit is isolated and placed off-line so that it cannot transfer incorrect information to other units. The partner of the now off-line unit continues operating and thereby enables the entire module to continue operating. A user is seldom aware of such a fault detection and transition to off-line status, except for the display or other presentation of a maintenance request to service the off-line unit. The card arrangement allows easy removal and replacement.

The memory unit is also assigned the task of checking the system bus. For this purpose, the unit has parity checkers that test the address signals and that test the data signals on the bus structure. Upon determining that either bus is faulty, the memory unit signals other units of the module to obey only the non-faulty bus. The power supply unit for the processor module employs two power sources, each of which provides operating power to only one unit in each pair of partner units. Upon detecting a failing supply voltage, all output lines from the affected unit to the bus structure are clamped to ground potential to prevent a power failure from causing the transmission of faulty information to the bus structure.

FIG. 3 shows in the form of a functional diagram, the interconnection of paired S/370 processors with paired S/88 processors in the manner of a fault tolerant structure to enable the direct exchange of data. The similarity to the prior S/88
structure (FIG. 2) is intentional but it is the unique interconnection by means of both hardware and software that establishes the operation of the preferred embodiment. It will be observed that the S/370 processors are coupled to storage control logic and bus interface logic in addition to the S/88 type compare logic. As will be described the compare logic will function in the same manner as the compare logic for the S/88 processors. Moreover the S/370 processors are directly coupled and coupled through the system bus to corresponding S/88 processors. As with the S/88 processor the S/370 processors are coupled in pairs and the pairs are intended to be mounted on field replaceable, hot-pluggable, circuit cards. The detailed interconnections of the several drivers will be described in greater detail later.

The preferred embodiment interconnects plural S/370 processors for executing the same S/370 instructions concurrently under control of a S/370 operating system. These are coupled to corresponding plural S/88 processors, I/O apparatus and main storage, all executing the same S/88 instructions concurrently under control of a S/88 operating system. As will be described later means are included to asynchronously uncouple the S/88 processors from their I/O apparatus and storage, to pass S/370 I/O commands and data from the S/370 processors to the S/88 processors while the latter are uncoupled, and to convert the commands and data to a form useable by the S/88 for later processing by the S/88 processors when they are recoupled to their I/O apparatus and main storage.

1. Operating a Normally Non-Fault Tolerant Processor in a Fault Tolerant Environment

The previously listed fault tolerant features are achieved in a preferred embodiment by coupling normally non-fault-tolerant processors such as S/370 processors in a first pair which execute the same S/370 instructions simultaneously under control of one of the S/370 operating systems. Means are provided to compare the states of various signals in one processor with those in the other processor for instantaneously detecting errors in one or both processors.

A second partner pair of S/370 processors with compare means are provided for executing the same S/370 instructions concurrent with the first pair and for detecting errors in the second pair. Each S/370 processor is coupled to a respective S/88
processor of a fault-tolerant system such as the S/88 data processing system having first and partner second pairs of processors, S/88 I/O apparatus and S/88 main storage. Each S/88 processor has associated therewith hardware coupling it to the I/O apparatus and main storage.

The respective S/370 and S/88 processors each have their processor buses coupled to each other by means including a bus control unit. Each bus control unit includes means which interacts with an application program running on the respective S/88
processor to asynchronously uncouple the respective S/88 processor from its associated hardware and to couple it to the bus control unit (1) for the transfer of S/370 commands and data from the S/370 processor to the S/88 processor and (2) for conversion of the S/370 commands and data to commands executable by and data useable by the S/88.

The S/88 data processing system subsequently processes the commands and data under control of the S/88 operating system. The S/88 data processing system also responds to error signals in either one of the S/370 processor pairs or in their respectively coupled S/88 processor pair to remove the coupled pairs from service and permit continued fault tolerant operation with the other coupled S/370, S/88 pairs. With this arrangement, S/370 programs are executed by the S/370 processors (with the assistance of the S/88 system for I/O operations) in a fault tolerant (FT) environment with the advantageous features of the S/88, all without significant changes to the S/370 and S/88 operating systems.

In addition, the storage management unit of the S/88 is controlled so as to assign dedicated areas in the S/88 main storage to each of the duplexed S/370 processor pairs and their operating system without knowledge by the S/88 operating system. The processors of the duplexed S/370 processor pairs are coupled individually to the common bus structure of the S/88 via a storage manager apparatus and S/88 bus interface for fetching and storing S/370 instructions and data from their respective dedicated storage area.

The preferred embodiment provides a method and means of implementing fault tolerance in the S/370 hardware without rewriting the S/370 operating system or S/370 applications. Full S/370 CPU hardware redundancy and synchronization is provided without custom designing a processor to support fault tolerance. A S/370 operating system and a fault tolerant operating system, (both virtual memory systems) are run concurrently without a major rewrite of either operating system. A hardware/microcode interface is provided in the preferred embodiment between peer processor pairs, each processor executing a different operating system. One processor is a microcode controlled IBM S/370 engine executing an IBM Operating System (e.g., VM, VSE, IX370, etc.). The second processor of the preferred embodiment is a hardware fault tolerant engine executing an operating system capable of controlling a hardware fault tolerant environment (e.g., IBM System/88), executing S/88 VOS (virtual operating system).

The hardware/microcode interface between the processor pairs allows the two operating systems to coexist in an environment perceived by the user as a single system environment. The hardware/microcode resources (memory, system buses, disk I/O, tape, communications I/O terminals, power and enclosures) act independently of each other while each operating system handles its part of the system function. The words memory, storage and store are used interchangeably herein. The FT processor(s) and operating system manage error detection/isolation and recovery, dynamic reconfiguration, and I/O operations. The NFT processor(s) execute native instructions without any awareness of the FT processor. The FT processor appears to the NFT processor as multiple I/O channels.

The hardware/microcode interface allows both virtual memory processors to share a common fault tolerant memory. A continuous block of storage from the memory allocation table of the FT processor is assigned to each NFT processor. The NFT processor's dynamic address translation feature controls the block of storage that was allocated to it by the FT processor. The NFT processor perceives that its memory starts at address zero through the use of an offset register. Limit checking is performed to keep the NFT processor in its own storage boundaries. The FT processor can access the NFT storage and DMA I/O blocks of data in or out of the NFT address space, whereas the NFT processor is prevented from accessing storage outside its assigned address space. The NFT storage size can be altered by changing the configuration table.

2. Uncoupling a Processor from Its Associated Hardware to Present Commands and Data from Another Processor to Itself.

Adding a new device to an existing processor and operating system generally requires hardware attachment via a bus or channel, and the writing of new device driver software for the operating system. The improved "uncoupling" feature allows two distinct processors to communicate with each other without attaching one of the processors to a bus or channel and without arbitrating for bus mastership. The processors communicate without significant operating system modification or the requirements of a traditional device driver. It can give to a user the image of a single system when two distinct and dissimilar processors are merged, even though each processor is executing its own native operating system.

This feature provides a method and means of combining the special features exhibited by a more recently developed operating system, with the users view and reliability of a mature operating system. It couples the two systems (hardware and software) together to form a new third system. It will be clear to those skilled in the art that while the preferred embodiment shows a S/370 system coupled to a S/88 system any two distinct systems could be coupled. The design criteria of this concept are: little or no change to the mature operating system so that it maintains its reliability, and minimal impact to the more recently developed operating system because of the development time for code.

This feature involves a method of combining two dissimilar systems each with its own characteristics into a third system having characteristics of both. A preferred form of the method requires coupling logic between the systems that functions predominantly as a direct memory access controller (DMAC). The main objective of this feature is to give an application program running in a fault tolerant processor (e.g., S/88 in the preferred embodiment) and layered on the fault tolerant operating system, a method of obtaining data and commands from an alien processor (e.g., S/370 in the preferred embodiment) and its operating system. Both hardware and software defense mechanisms exist on any processor to prevent intrusion (i.e. supervisor versus user state, memory map checking, etc.). Typically, operating systems tend to control all system resources such as interrupts, DMA Channels, and I/O devices and controllers. Therefore, to couple two different architectures and transfer commands and data between these machines without having designed this function from the ground up is considered by many a monumental task and/or impractical.

FIG. 4 shows diagrammatically a S/370 processor coupled to a S/88 processor in the environment of the preferred embodiment. By contrast with the S/370 processor shown in FIG. 1, the memory has been replaced by S/88 bus interface logic and the S/370 channel processor has been replaced by a bus adapter and bus control unit. Particular attention is directed to the interconnection between the S/370 bus control unit and the S/88 processor which is shown by a double broken line.

This feature involves attaching the processor coupling logic to the S/88 fault tolerant processor's virtual address bus, data bus, control bus and interrupt bus structure, and not to the system bus or channel as most devices are attached. The strobe line indicating that a valid address is on the fault tolerant processor's virtual address bus is activated a few nanoseconds after the address signals are activated. The coupling logic comprising the bus adapter and the bus control unit determines whether a preselected address range is presented by a S/88 application program before the strobe signal appears. If this address range is detected, the address strobe signal is blocked from going to the S/88 fault tolerant processor hardware. This missing signal will prevent the fault tolerant hardware and operating system from knowing a machine cycle took place. The fault tolerant checking logic in the hardware is isolated during this cycle and will completely miss any activity that occurs during this time. All cache, virtual address mapping logic and floating point processors on the processor bus will fail to recognize that a machine cycle has occurred. That is, all S/88 CPU functions are `frozen,` awaiting the assertion of the Address Strobe signal by the S/88 processor.

The address strobe signal that was blocked from the fault tolerant processor logic is sent to the coupling logic. This gives the S/88 fault tolerant processor complete control over the coupling logic which is the interface between the fault tolerant special application program and the attached S/370 processor. The address strobe signal and the virtual address are used to select local storage, registers and the DMAC which are components of the coupling logic. FIG. 5 shows diagrammatically the result of the detection of an interrupt from the S/370 bus control logic which is determined to be at the appropriate level and corresponding to an appropriate address. In its broadest aspect therefore, the uncoupling mechanism disconnects a processor from its associated hardware and connects the processor to an alien entity for the efficient transfer of data with said entity.

The coupling logic has a local store which is used to queue incoming S/370 commands and store data going to and from the S/370. The data and commands are moved into the local store by multiple DMA channels in the coupling logic. The fault tolerant application program initializes the DMAC and services interrupts from the DMAC, which serves to notify the application program when a command has arrived or when a block of data has been received or sent. To complete an operation, the coupling logic must return data strobe acknowledge lines, prior to the clocking edge of the processor to insure that both sides of the fault tolerant processor stay in sync.

The application program receives S/370 channel type commands such as Start I/O, Test I/O, etc. The application program then converts each S/370 I/O command into a fault tolerant I/O command and initiates a normal fault tolerant I/O command sequence.

This is believed to be a new method of getting a block of data around an operating system and to an application. It is also a way of allowing an application to handle an interrupt which is a function usually done by an operating system. The application program can switch the fault tolerant processor from its normal processor function to the I/O controller function at will, and on a per cycle basis, just by the virtual address it selects.

Thus, two data processing systems having dissimilar instruction and memory addressing architectures are tightly coupled so as to permit one system to effectively access any part of the virtual memory space of the other system without the other system being aware of the one system's existence. Special application code in the other system communicates with the one system via hardware by placing special addresses on the bus. Hardware determines if the address is a special one. If it is, the strobe is blocked from being sensed by the other system's circuits, and redirected such that the other system's CPU can control special hardware, and a memory space, accessible to both systems.

The other system can completely control the one system when necessary, as for initialization and configuration tasks. The one system cannot in any way control the other system, but may present requests for service to the other system in the following manner:

The one system stages I/O commands and/or data in one system format in the commonly accessible memory space and, by use of special hardware, presents an interrupt to the other system at a special level calling the special application program into action.

The latter is directed to the memory space containing the staged information and processes same to convert its format to the other system's native form. Then the application program directs the native operating system of the other system to perform native I/O operations on the converted commands and data. Thus, all of the foregoing occurs completely transparent to and with no significant change in the native operating systems of both systems.

3. Presentation of Interrupts to a System Transparent to the Operating System

Most current programs execute in one of two (or more) states, a supervisor state or a user state. Application programs run in user state, and functions such as interrupts run in supervisor state.

An application attaches an I/O port then opens the port, issues an I/O request in the form of a read, write or control. At that time the processor will take a task switch. When the operating system receives an interrupt signifying an I/O completion, then the operating system will put this information into a ready queue and sort by priority for system resources.

The operating system reserves all interrupt vectors for its own use; none are available for new features such as an external interrupt signifying an I/O request from another machine.

In the S/88 of the preferred embodiment, a majority of the available interrupt vectors are actually unused, and these are set up to cause vectoring to a common error handler for `uninitialized` or `spurious` interrupts, as is the common practice in operating systems. The preferred embodiment of this improvement replaces a subset of these otherwise unused vectors with appropriate vectors to special interrupt handlers for the S/370 coupling logic interrupts. The modified S/88 Operating System is then rebound for use with the newly-integrated vectors in place.

The System/88 of the preferred embodiment has eight interrupt levels and uses autovectors on all levels except level 4. The improvement of the present application uses one of these autovector levels, level 6, which has the next to highest priority. This level 6 is normally used by the System/88 for A/C power disturbance interrupts.

The logic which couples the System/370 to the System/88 presents interrupts to level 6 by ORing its interrupt requests with those of the A/C power disturbance. During system initialization, appropriate vector numbers to the special interrupt handlers for the coupling logic interrupts are loaded into the coupling logic (some, for example, into DMAC registers) by an application program, transparent to the S/88 operating system.

When any interrupt is received by the System/88, it initiates an interrupt acknowledge (IACK) cycle using only hardware and internal operations of the S/88 processor to process the interrupt and fetch the first interrupt handler instruction. No program instruction execution is required. However, the vector number must also be obtained and presented in a transparent fashion. This is achieved in the preferred embodiment by uncoupling the S/88 processor from its associated hardware (including the interrupt presenting mechanism for A/C power disturbances) and coupling the S/88 processor to the S/370-S/88 coupling logic when a level 6 interrupt is presented by the coupling logic.

More specifically, the S/88 processor sets the function code and the interrupt level at its outputs and also asserts Address Strobe (AS) and Data Strobe (DS) at the beginning of the IACK cycle. The Address Strobe is blocked from the S/88
hardware, including the A/C power disturbance interrupt mechanism, if the coupling logic interrupt presenting signal is active; and AS is sent to the coupling logic to read out the appropriate vector number, which is gated into the S/88 processor by the Data STrobe. Because the Data Strobe is blocked from the S/88 hardware, the machine cycle (IACK) is transparent to the S/88 Operating System relative to obtaining the coupling logic interrupt vector number.

If the coupling logic interrupt signal had not been active at the beginning of the IACK cycle a normal S/88 level 6 interrupt would have been taken.

4. Sharing a Real Storage Between Two or More Processors Executing Different Virtual Storage Operating Systems.

This feature couples a fault tolerant system to an alien processor and operating system that does not have code to support a fault tolerant storage, i.e. code to support removal and insertion of storage boards via hot plugging, instantaneous detection of corrupted data and its recovery if appropriate, etc.

This feature provides a method and means whereby two or more processors each executing different virtual operating systems can be made to share a single real storage in a manner transparent to both operating systems, and wherein one processor can access the storage space of the other processor so that data transfers between these multiple processors can occur.

This feature combines two user-apparent operating systems environments to give the appearance to the user of a single operating system. Each operating system is a virtual operating system that normally controls its own complete real storage space. This invention has only one real storage space that is shared by both processors via a common system bus. Neither operating system is substantially rewritten and neither operating system knows the other exists, or that the real storage is shared. This feature uses an application program running on a first processor to search through the first operating system's storage allocation queue. When a contiguous storage space is found, large enough to satisfy the requirements of the second operating system, then this storage space is removed, by manipulating pointers, from the first operating system's storage allocation table. The first operating system no longer has use (e.g., the ability to reallocate) of this removed storage unless the application returns the storage back to the first operating system.

The first operating system is subservient to the second operating system from an I/O perspective and responds to the second operating system as an I/O controller. The first operating system is the master of all system resources, and in the preferred embodiment is a hardware fault tolerant operating system. The first operating system initially allocates and de-allocates storage (except for the storage which is "stolen" for the second operating system), and handles all associated hardware failures and recovery. The objective is to combine the two operating systems without altering the operating system code to any major degree. Each operating system must believe it is controlling all of system storage, since it is a single resource being used by both processors.

When the system is powered up, the first operating system and its processor assume control of the system, and hardware holds the second processor in a reset condition. The first operating system boots the system and determines how much real storage exists. The operating system eventually organizes all storage into 4KB (4096 bytes) blocks and lists each available block in a storage allocation queue. Each 4KB block listed in the queue points to the next available 4KB block. Any storage used by the first system is either removed or added in 4KB blocks from the top of the queue; and the block pointers are appropriately adjusted. As users request memory space from the operating system the requests are satisfied by assigning from the queue a required number of 4KB blocks of real storage. When the storage is no longer needed, the blocks will be returned to the queue.

Next the first operating system executes a list of functions called module-start-up that configures the system. One application that is executed by the module-start-up is a new application used to capture storage from the first operating system and allocate the storage to the second operating system. This program scans the complete storage allocation list and finds a contiguous string of 4KB blocks of storage. The application program then alters the pointers in the portion of the queue corresponding to the contiguous string of blocks, thereby removing a contiguous block of storage from the first operating system's memory allocation list. In the preferred embodiment, the pointer of the 4KB block preceding the first 4KB block removed is changed to point to the 4KB block immediately following the removed contiguous string of blocks.

The first operating system at this point has no control or knowledge of this real memory space unless the system is rebooted or the application returns the storage pointers. It is as if the first operating system considers a segment of real storage allocated to a process running on itself and not reallocable because the blocks are removed from the table, not merely assigned to a user.

The removed address space is then turned over to the second operating system. There is hardware offset logic that makes the address block, stolen from the first operating system and given to the second operating system, appear to start at address zero to the second operating system. The second operating system then controls the storage stolen from the first operating system as if it is its own real storage, and controls the storage through its own virtual storage manager, i.e. it translates virtual addresses issued by the second system into real addresses within the assigned real storage address space.

An application program running on the first operating system can move I/O data into and out of the second processor's storage space, however, the second processor cannot read or write outside of its allocated space because the second operating system does not know of the additional storage. If an operating system malfunction occurs, in the second operating system, a hardware trap will prevent the second operating system from inadvertently writing in the first operating system space.

The amount of storage space allocated to the second operating system is defined in a table in the module-start-up program by the user. If the user wants the second processor to have 16 megabytes then he will define that in the module start up table and the application will acquire that much space from the first operating system. A special SVC (service call) allows the application program to gain access to the supervisor region of the first operating system so that the pointers can be modified.

An important reason why it is desirable for both operating systems to share the same storage is that the storage is fault tolerant on the first processor; and the second processor is allowed to use fault tolerant storage and I/O from the first processor. The second processor is made to be fault tolerant by replicating certain of the hardware and comparing certain of the address, data, and control lines. Using these techniques the second processor is, in fact, a fault tolerant machine even though the second operating system has no fault tolerant capabilities. More than one alien processor and operating system of the second type can be coupled to the first operating system with a separate real storage area provided for each alien processor.

In the preferred embodiment, the first operating system is that of the fault tolerant S/88 and the second operating system is one of the S/370 operating systems and the first and second processors are S/88 and S/370 processors respectively. This feature not only enables a normally non-fault- tolerant system to use a fault tolerant storage which is maintained by a fault tolerant system but also enables the non-fault-tolerant system (1) to share access to fault tolerant I/O apparatus maintained by the fault tolerant system and (2) to exchange data between the systems in a more efficient manner without the significant delays of a channel-to-channel coupling.

5. Single System Image

The term single system image is used to characterize computer networks in which user access to remote data and resources (e.g., printer, hard file, etc.) appears to the user to be the same as access to data and resources at the local terminal to which the user's keyboard is attached. Thus, the user may access a data file or resource simply by name and without having to know the object's location in the network.

The concept of "derived single system image" is introduced here as a new term, and is intended to apply to computer elements of a network which lack facilities to attach directly to a network having a single system image, but utilize hardware and software resources of that network to attach directly to same with an effective single system image.

For purposes of this discussion, direct attachment of a computer system, for developing effects of "derived single system image," can be effectuated with various degrees of coupling between that system and elements of the network. The term "loose coupling" as used here means a coupling effectuated through I/O channels of the deriving computer and the "native" computer which is part of the network. "Tight coupling" is a term presently used to describe a relationship between the deriving and native computers which is established through special hardware allowing each to communicate with the other on a direct basis (i.e., without using existing I/O channels of either).

A special type of tight coupling presently contemplated, termed "transparent tight coupling," involves the adaptation of the coupling hardware to enable each computer (the deriving and native computers) to utilize resources of the other computer in a manner such that the operating system of each computer is unaware of such utilization. Transparent tight coupling, as just defined, forms a basis for achieving cost and performance advantages in the coupled network. The cost of the coupling hardware, notwithstanding complexity of design, should be more than offset by the savings realized by avoiding the extensive modifications of operating system software which otherwise would be needed. Performance advantages flow from faster connections due to the direct coupling and reduced bandwidth interference at the coupling interface.

The term "network" as used in this section is more restricted than the currently prevalent concept of a network which is a larger international teleprocessing/satellite connection scheme to which many dissimilar machine types may connect if in conformance to some specific protocol. Rather "network" is used in this section to apply to a connected complex of System/88 processors or alternatively to a connected complex of other processors having the characteristics of a single system image.

Several carefully defined terms will be used to further explain the concept of a single system image as contemplated herein; and it will be assumed that the specific preferred embodiment of the improvement will be used as the basis for the clarification:

a. High Speed Data Interconnection (HSDI) refers to a hardware subsystem (and cable) for data transfer between separate hardware units.

b. Link refers to a software construct or object which consists entirely of a multi-part pointer to some other software object and which has much of the character of an alias name.

c. MODULE refers to a free-standing processing unit consisting of at least one each of: enclosure, power supply, CPU, memory, and I/O device. A MODULE can be expanded by bolting together multiple enclosures to house additional peripheral devices creating a larger single module. Some I/O units (terminals, printers) may be external and connected to the enclosure by cables; they are considered part of the single MODULE. A MODULE may have only one CPU complex.

d. CPU COMPLEX refers to one or more single or dual processor boards within the same enclosure, managed and controlled by Operating System software to operate as a single CPU. Regardless of the actual number of processor boards installed, any user program or application is written, and executed, as if only one CPU were present. The processing workload is roughly shared among the available CPU boards, and multiple tasks may execute concurrently, but each application program is presented with a `SINGLE-CPU IMAGE.`

e. OBJECT refers to a collection of data (including executable programs) stored in the system (disk, tape) which can be uniquely identified by a hierarchical name. A LINK is a uniquely-named pointer to some other OBJECT, and so is considered an OBJECT itself. An I/O PORT is a uniquely-named software construct which points to a specific I/O device (a data source or target), and thus is also an OBJECT. The Operating System effectively prevents duplication of OBJECT NAMES.

Because the term `single system image` is not used consistently in the literature, it will be described in greater detail for clarification of the present improvement of a "derived single system image." In defining and describing the term SINGLE-SYSTEM IMAGE, the `image` refers to the application program's view of the system and environment. `System,` in this context, means the combined hardware (CPU complex) and software (Operating System and its utilities) to which the application programmer directs his instructions. `Environment` means all I/O devices and other connected facilities which are addressable by the Operating System and thus accessible indirectly by the programmer, through service requests to the Operating System.

A truly single, free-standing computer with its Operating System, then, must provide a SINGLE-SYSTEM IMAGE to the programmer. It is only when we want to connect multiple systems together in order to share I/O devices and distribute processing that this `image` seen by the programmer begins to change; the ordinary interconnection of two machines via teleprocessing lines (or even cables) forces the programmer to understand--and learn to handle--the dual environment, in order to take advantage of the expanded facilities.

Generally, in order to access facilities in the other environment, he must request his local Operating System to communicate his requirements to the `other` Operating System, and specify those requirements in detail. He must then be able to accept the results of his request asynchronously (and in proper sequence) after an arbitrarily long delay. The handling and control of the multiple messages and data transfers between machines constitute significant processing overhead in both machines; it can be unwieldy, inefficient, and difficult for the programmer in such a DUAL-SYSTEM environment. And when the number of conventionally-connected machines goes up, the complexity for the programmer can increase rapidly.

The System/88 original design included the means to simplify this situation and provide the SINGLE-SYSTEM IMAGE to the programmer, i.e., the HSDI connection between MODULEs, and HSDI drive software within the Operating System in each MODULE. Here, in a two-MODULE system for example, each of the two Operating Systems `know about` the entire environment, and can access facilities across the HSDI without the active intervention of the `other` Operating System. The reduction in communications overhead is considerable.

A large number of MODULEs of various sizes and model types can be interconnected via HSDI to create a system complex that appears to the programmer as one (expandable) environment. His product, an application program, can be stored on one disk in this system complex, executed in any of the CPUs in the complex, controlled or monitored from essentially any of the terminals of the complex, and can transfer data to and from any of the I/O devices of the complex, all without any special programming considerations and with improved execution efficiency over the older methods.

The operating system and its various features and facilities are written in such a way as to natively assume the distributed environment and operate within that environment with the user having no need to be concerned with or have control over where the various entities (utilities, applications, data, language processors, etc.) reside. The key to making all of this possible is the enforced rule that each OBJECT must have a unique name; and this rule easily extends to the entire system complex since the most basic name-qualifier is the MODULE name, which itself must be unique within the complex. Therefore, locating any OBJECT in the entire complex is as simple as correctly naming it. Naming an OBJECT is in turn simplified for the programmer by the provision of LINKs, which allow the use of very short alias pointers to (substitute names for) OBJECTS with very long and complicated names.

To achieve the concept of a "derived single system image" within this complex of interconnected S/88 modules, a plurality of S/370 processors are coupled to S/88 processors in such a manner as to provide for the S/370 processor users at least some aspects of the S/88 single system image features. This, even though the S/370 processors and operating systems do not provide these features.

One or more S/370 processors are provided within the S/88 MODULE. A S/88 processor is uniquely coupled to each S/370 processor. As will be seen, each S/370 processor is replicated and controlled by S/88 software for fault- tolerant operation. The unique direct coupling of the S/88 and S/370 processors, preferably by the uncoupling and interrupt function mechanisms described above, render data transfers between the processors transparent to both the S/370 and S/88 operating systems. Neither operating system is aware of the existence of the other processor or operating system.

Each S/370 processor uses the fault-tolerant S/88 system complex to completely provide the S/370 main storage, and emulated S/370 I/O Channel(s) and I/O device(s). The S/370s have no main memory, channels, or I/O devices which are not part of the S/88, and all of these facilities are fault-tolerant by design.

At system configuration time, each S/370 processor is assigned a dedicated contiguous block of 1 to 16 megabytes of main storage from the S/88 pool; this block is removed from the configuration tables of the S/88 so that the S/88 Operating System cannot access it, even inadvertently. Fault-tolerant hardware registers hold the storage block pointer for each S/370, so that the S/370 has no means to access any main storage other than that assigned to it. The result is an entirely conventional, single-system view of its main memory by the S/370; the fault-tolerant aspect of the memory is completely transparent. An application program (EXEC370) in the S/88 emulates S/370 Channel(s) and I/O device(s) using actual S/88 devices and S/88 Operating System calls. It has the SINGLE-SYSTEM IMAGE view of the S/88 complex, since it is an application program; thus this view is extended to the entire S/370 `pseudo-channel.`

From the opposite point of view, that of the S/370 Operating System (and application programs by extension), it may help to visualize a `window` (the channel) through which all I/O operations take place. The window is not altered in character--no S/370 programs need be changed--but the `view` through the window is broadened to include the SINGLE-SYSTEM IMAGE attributes. A small conceptual step then pictures a large number of S/370s efficiently sharing a single database, that managed by the S/88.

A consequence of this connection technique is relatively simple and quick dynamic reconfigurability of each S/370. The channel `window` is two-way, and the S/88 control program EXEC370 is on the other side of it; EXEC370 has full capability to stop, reset, reinitialize, reconfigure, and restart the S/370 CPU. Thus, by transparent emulation of S/370 I/O facilities using other facilities which possess the SINGLE-SYSTEM IMAGE attribute (S/88 I/O and Operating System), this attribute is extended and afforded to the S/370.

The S/370 therefore has been provided with object location independence. Its users may access a data file or other resource by name, a name assigned to it in the S/88 operating system directory. The user need not know the location of the data file in the complex of S/370-S/88 modules.

S/370 I/O commands issued by one S/370 processing unit in one module 9 are processed by an associated S/88 processing unit tightly coupled to the S/370 processing unit in the same module (or by other S/88 processing units interconnected in the module 9 and controlled by the same copy of the S/88 virtual operating system which supports multiprocessing) to access data files and the like resident in the same or other connected modules. It may return the accessed files to the requesting S/370
processing unit or send them to other modules, for example, to merge with other files.

6. Summary

Thus, the functions of two virtual operating systems (e.g., S/370 VM, VSE or IX370 and S/88 OS) are merged into one physical system. The S/88 processor runs the S/88 OS and handles the fault tolerant aspects of the system. At the same time, one or more S/370 processors are plugged into the S/88 rack and are allocated by the S/88 OS anywhere from 1 to 16 megabytes of contiguous memory per S/370 processor. Each S/370 virtual operating system thinks its memory allocation starts at address 0 and it manages its memory through normal S/370 dynamic memory allocation and paging techniques. The S/370 is limit checked to prevent the S/370 from accessing S/88 memory space. The S/88 must access the S/370 address space since the S/88 must move I/O data into the S/370 I/O buffers. The S/88 Operating System is the master over all system hardware and I/O devices. The peer processor pairs execute their respective Operating Systems in a single system environment without significant rewriting of either operating system.

Introduction--Prior Art System/88

The improvements of the present application will be described with respect to a preferred form in which IBM System/370 (S/370) processing units (executing S/370 instructions under the control of any one of the S/370 operating systems such as VM, VSE, IX370, etc.) are tightly coupled to IBM System/88 (S/88) processing units (executing S/88 instructions in a fault tolerant manner under control of a S/88 operating system in a fault tolerant environment) in a manner which permits fault tolerant operation of the S/370 processing units with the System/88 features of single system image, hot pluggability, instantaneous error detection, I/O load distribution and fault isolation and dynamic reconfigurability.

The IBM System/88 marketed by International Business Machines Corp. is described generally in the IBM System/88 Digest, Second Edition, published in 1986 and other available S/88 customer publications. The System/88 computer system including module 10, FIG. 6A, is a high availability system designed to meet the needs of customers who require highly reliable online processing. System/88 combines a duplexed hardware architecture with sophisticated operating system software to provide a fault tolerant system. The System/88 also provides horizontal growth through the attachment of multiple System/88 modules 10a, 10b, 10c, through the System/88 high speed data interconnections (HSDIs), FIG. 6B, and modules 10d-g through the System/88 Network, FIG. 6C.

The System/88 is designed to detect a component failure when and where it occurs, and to prevent errors and interruptions caused by such failures from being introduced into the system. Since fault tolerance is a part of the System/88 hardware design, it does not require programming by the application developer. Fault tolerance is accomplished with no software overhead or performance degradation. The System/88 achieves fault tolerance through the duplication of major components, including processors, direct access storage devices (DASDs) or disks, memory, and controllers. If a duplexed component fails, its duplexed partner automatically continues processing and the system remains available to the end users. Duplicate power supplies with battery backup for memory retention during a short-term power failure are also provided. System/88 and its software products offer ease of expansion, the sharing of resources among users, and solutions to complex requirements while maintaining a single system image to the end user.

A single system image is a distributed processing environment consisting of many processors, each with its own files and I/O, interconnected via a network or LAN, that presents to the user the impression he is logged on to a single machine. The operating system allows the user to converse from one machine to another just by changing a directory.

With proper planning, the System/88 processing capacity can be expanded while the System/88 is running and while maintaining a single-system image to the end user. Horizontal growth is accomplished by combining multiple processing modules into systems using the System/88 HSDI, and combining multiple systems into a network using the System/88 Network.

A System/88 processing module is a complete, stand-alone computer as seen in FIG. 6A of the drawings. A System/88 system is either a single module or a group of modules connected in a local network with the IBM HSDI as seen in FIG. 6B. The System/88 Network, using remote transmission facilities, is the facility used to interconnect multiple systems to form a single-system image to the end user. Two or more systems can be interconnected by communications lines to form a long haul network. This connection may be through a direct cable, a leased telephone line, or an X.25 network. The System/88 Network detects references to remote resources and routes messages between modules and systems completely transparent to the user.

Hot pluggability allows many hardware replacements to be done without interrupting system operation. The System/88 takes a failing component out of service, continuing service with its duplexed partner, and lights an indicator on the failing component--all without operator intervention. The customer or service personnel can remove and replace a failed duplexed board while processing continues. The benefits to a customer include timely repair and reduced maintenance costs.

Although the System/88 is a fault-tolerant, continuous operation machine, there are times when machine operation will need to be stopped. Some examples of this are to upgrade the System/88 Operating System, to change the hardware configuration (add main storage), or to perform certain service procedures.

The duplexed System/88 components and the System/88 software help maintain data integrity. The System/88 detects a failure or transient error at the point of failure and does not propagate it throughout the application or data. Data is protected from corruption and system integrity is maintained. Each component contains its own error-detection logic and diagnostics. The error-detection logic compares the results of parallel operations at every machine cycle.

If the system detects a component malfunction, that component is automatically removed from service. Processing continues on the duplexed partner while the failed component is checked by internal diagnostics. The error-detection functions will automatically run diagnostics on a failing component removed from service while processing continues on its duplexed partner. If the diagnostics determine that certain components need to be replaced, the System/88 can automatically call a support center to report the problem. The customer benefits from quick repairs and low maintenance costs.

The System/88 is based generally upon processor systems of the type described in detail in U.S. Pat. No. 4,453,215, entitled "Central Processing Apparatus for Fault Tolerant Computing", issued Jun. 5, 1984 to Robert Reid and related U.S. Pat. Nos. 4,486,826, 4,597,084, 4,654,857, 4,750,177 and 4,816,990; and said patents are hereby incorporated herein by reference in their entirety as if they were set forth fully herein. Portions of the '215 Reid patent are shown diagrammatically in FIGS. 7
and 8 of the present application.

This computer system of FIGS. 7 and 8 of the present application has a processor module 10 with a processing unit 12, a random access storage unit 16, peripheral control units 20, 24, 32, and a single bus structure 30 which provides all information transfers between the several units of the module. The bus structure within each processor module includes duplicate partner buses A, B, and each functional unit 12, 16, 20, 24, 32 has an identical partner unit. Each unit, other than control units which operate with asynchronous peripheral devices, normally operates in lock-step synchronism with its partner unit. For example, the two partner memory units 16, 18 of a processor module normally both drive the two partner buses A, B, and are both driven by the bus structure 30, in full synchronism.

The computer system provides fault detection at the level of each functional unit within a processor module. To attain this feature, error detectors monitor hardware operations within each unit and check information transfers between the units. The detection of an error causes the processor module to isolate the bus or unit which caused the error from transferring information to other units, and the module continues operation. The continued operation employs the partner of the faulty bus or unit. Where the error detection precedes an information transfer, the continued operation can execute the transfer at the same time it would have occurred in the absence of the fault. Where the error detection coincides with an information transfer, the continued operation can repeat the transfer.

The computer system can effect the foregoing fault detection and remedial action rapidly, i.e. within a fraction of an operating cycle. The computer system has at most only a single information transfer that is of questionable validity and which requires repeating to ensure total data validity.

Although a processor module has significant hardware redundancy to provide fault-tolerant operation, a module that has no duplicate units is nevertheless fully operational.

The functional unit redundancy enables the module to continue operating in the event of a fault in any unit. In general, all units of a processor module operate continuously, and with selected synchronism, in the absence of any detected fault. Upon detection of an error-manifesting fault in any unit, that unit is isolated and placed off-line so that it cannot transfer information to other units of the module. The partner of the off-line unit continues operating, normally with essentially no interruption.

In addition to the partnered duplication of functional units within a module to provide fault-tolerant operation, each unit within a processor module generally has a duplicate of hardware which is involved in a data transfer. The purpose of this duplication, within a functional unit, is to test, independently of the other units, for faults within each unit. Other structure within each unit of a module, including the error detection structure, is in general not duplicated.

The common bus structure which serves all units of a processor module preferably employs a combination of the foregoing two levels of duplication and has three sets of conductors that form an A bus, a B bus that duplicates the A bus, and an X bus. The A and B buses each carry an identical set of cycle-definition, address, data, parity and other signals that can be compared to warn of erroneous information transfer between units. The conductors of the X bus, which are not duplicated, in general carry module-wide and other operating signals such as timing, error conditions, and electrical power. An additional C bus is provided for local communication between partnered units.

A processor module detects and locates a fault by a combination of techniques within each functional unit including comparing the operation of duplicated sections of the unit, the use of parity and further error checking and correcting codes, and by monitoring operating parameters such as supply voltages. Each central processing unit has two redundant processing sections and, if the comparison is invalid, isolates the processing unit from transferring information to the bus structure. This isolates other functional units of the processor module from any faulty information which may stem from the processing unit in question. Each processing unit also has a stage for providing virtual memory operation which is not duplicated. Rather, the processing unit employs parity techniques to detect a fault in this stage.

The random access memory unit 16 is arranged with two non-redundant memory sections, each of which is arranged for the storage of different bytes of a memory word. The unit detects a fault both in each memory section and in the composite of the two sections, with an error-correcting code. Again, the error detector disables the memory unit from transferring potentially erroneous information onto the bus structure and hence to other units.

The memory unit 16 is also assigned the task of checking the duplicated bus conductors, i.e. the A bus and the B bus. For this purpose, the unit has parity checkers that test the address signals and that test the data signals on the bus structure. In addition, a comparator compares all signals on the A bus with all signals on the B bus. Upon determining in this manner that either bus is faulty, the memory unit signals other units of the module, by way of the X bus, to obey only the non-faulty bus.

Peripheral control units for a processor module employ a bus interface section for connection with the common bus structure, duplicate control sections termed "drive" and "check", and a peripheral interface section that communicates between the control sections and the peripheral input/output devices which the unit serves. There are disk control units 20, 22 for operation with disk memories 52a, 52b, a communication control unit 24, 26 for operation, through communication panels 50, with communication devices including terminals, printers and modems, and HSDI control units 32, 34 for interconnecting one processor module with another in a multiprocessor system. In each instance the bus interface section feeds input signals to the drive and check control sections from the A bus and/or the B bus, tests for logical errors in certain input signals from the bus structure, and tests the identity of signals output from the drive and check channels. The drive control section in each peripheral control unit provides control, address, status, and data manipulating functions appropriate for the I/O device which the unit serves. The check control section of the unit is essentially identical for the purpose of checking the drive control section. The peripheral interface section of each control unit includes a combination of parity and comparator devices for testing signals which pass between the control unit and the peripheral devices for errors.

A peripheral control unit which operates with a synchronous I/O device, such as a communication control unit 24, operates in lock-step synchronism with its partner unit. However, the partnered disk control units 20,22 operate with different non-synchronized disk memories and accordingly operate with limited synchronism. The partner disk control units 20, 22 perform write operations concurrently but not in precise synchronism inasmuch as the disk memories operate asynchronously of one another. The control unit 32 and its partner also typically operate with this limited degree of synchronism.

The power supply unit for a module employs two bulk power supplies, each of which provides operating power to only one unit in each pair of partner units. Thus, one bulk supply feeds one duplicated portion of the bus structure, one of two partner central processing units, one of two partner memory units, and one unit in each pair of peripheral control units. The bulk supplies also provide electrical power for non-duplicated units of the processor module. Each unit of the module has a power supply stage which receives operating power from one bulk supply and in turn develops the operating voltages which that unit requires. This power stage in addition monitors the supply voltages. Upon detecting a failing supply voltage, the power stage produces a signal that clamps to ground potential all output lines from that unit to the bus structure. This action precludes a power failure at any unit from causing the transmission of faulty information to the bus structure.

Some units of the processor module execute each information transfer with an operating cycle that includes an error-detecting timing phase prior to the actual information transfer. A unit which provides this operation, e.g. a control unit for a peripheral device, thus tests for a fault condition prior to effecting an information transfer. The unit inhibits the information transfer in the event a fault is detected. The module, however, can continue operation--without interruption or delay--and effect the information transfer from the non-inhibited partner unit.

Other units of the processor module, generally including at least the central processing unit and the memory unit, for which operating time is of more importance, execute each information transfer concurrently with the error detection pertinent to that transfer. In the event a fault is detected, the unit immediately produces a signal which alerts other processing units to disregard the immediately preceding information transfer. The processor module can repeat the information transfer from the partner of the unit which reported a fault condition. This manner of operation produces optimum operating speed in that each information transfer is executed without delay for the purpose of error detection. A delay only arises in the relatively few instances where a fault is detected. A bus arbitration means is provided to determin