United States Patent6173346
Wallach , ; et al.January 9, 2001

Title

Method for hot swapping a programmable storage adapter using a programmable processor for selectively enabling or disabling power to adapter slot in response to respective request signals

Abstract

A software architecture for the hot add and swap of adapters. The software architecture allows users to replace failed components, upgrade outdated components, and add new functionality, such as new network interfaces, disk interface adapters and storage, without impacting existing users. The software architecture supports the hot add and swap of off-the-shelf adapters, including those adapters that are programmable.


Inventors:Wallach; Walter August (Los Altos, CA), Khalili; Mehrdad  (San Jose, CA), Mahalingam; Mallikarjunan  (Santa Clara, CA), Reed; John M.  (Morgan Hill, CA)
Assignee:Micron Electronics, Inc. (Nampa, ID)
Appl. No.:942458
Filed:October 1, 1997

Current U.S. Class:710/302 710/46 710/5 710/72 710/8 710/100 710/17 710/18 
Field of Search:395/283,182.05,280,281,308,309,830,182.6,282,884,311,500,653,828 710/100,101,103,128,129,131,64,8,5,17,18,46,72 714/7-8,64

U.S. Patent Documents
4057847November 1977Lowell et al.
4100597July 1978Fleming et al.
4449182May 1984Rubinson et al.
4672535June 1987Katzman et al.
4692918September 1987Elliot et al.
4695946September 1987Andreasen et al.
4707803November 1987Anthony, Jr. et al.
4769764September 1988Levanon
4774502September 1988Kimura
4821180April 1989Gerety et al.
4835737May 1989Herrig et al.
4894792January 1990Mitchell et al.
4949245August 1990Martin et al.
4968977November 1990Chinnaswamy et al.
4999787March 1991McNally et al.
5006961April 1991Monico
5007431April 1991Donehoo, III
5033048July 1991Pierce et al.
5051720September 1991Kittirutsunetorn
5073932December 1991Yossifor et al.
5103391April 1992Barrett
5118970June 1992Olson et al.
5121500June 1992Arlington et al.
5123017June 1992Simpkins et al.
5136708August 1992Lapourtre et al.
5136715August 1992Hirose et al.
5138619August 1992Fasang et al.
5157663October 1992Major et al.
5210855May 1993Bartol
5245615September 1993Treu
5247683September 1993Holmes et al.
5253348October 1993Scalise
5261094November 1993Everson et al.
5265098November 1993Mattson et al.
5266838November 1993Gerner
5269011December 1993Yanai et al.
5272382December 1993Heald et al.
5272584December 1993Austruy et al.
5276863January 1994Heider
5277615January 1994Hastings et al.
5280621January 1994Barnes et al.
5283905February 1994Saadeh et al.
5307354April 1994Cramer et al.
5311397May 1994Harshberger et al.
5311451May 1994Barrett
5317693May 1994Cuenod et al.
5329625July 1994Kannan et al.
5337413August 1994Lui et al.
5351276September 1994Doll, Jr. et al.
5367670November 1994Ward et al.
5379184January 1995Barraza et al.
5379409January 1995Ishikawa
5386567January 1995Lien et al.
5388267February 1995Chan et al.
5402431March 1995Saadeh et al.
5404494April 1995Garney
5423025June 1995Goldman et al.
5426740June 1995Bennett
5430717July 1995Fowler et al.
5430845July 1995Rimmer et al.
5432715July 1995Shigematsu et al.
5432946July 1995Allard et al.
5438678August 1995Smith
5440748August 1995Sekine et al.
5448723September 1995Rowett
5455933October 1995Schieve et al.
5460441October 1995Hastings et al.
5463766October 1995Schieve et al.
5471634November 1995Giorgio et al.
5473499December 1995Weir
5483419January 1996Kaczeus, Sr. et al.
5485550January 1996Dalton
5485607January 1996Lomet et al.
5487148January 1996Komori et al.
5491791February 1996Glowny et al.
5493574February 1996McKinley
5493666February 1996Fitch
5513314April 1996Kandasamy et al.
5513339April 1996Agrawal et al.
5517646May 1996Piccirillo et al.
5519851May 1996Bender et al.
5526289June 1996Dinh et al.
5528409June 1996Cucci et al.
5530810June 1996Bowman
5533193July 1996Roscoe
5535326July 1996Baskey et al.
5542055July 1996Amini et al.
5546272August 1996Moss et al.
5548712August 1996Larson et al.
5555510September 1996Verseput et al.
5559764September 1996Chen et al.
5559958September 1996Farrand et al.
5559965September 1996Oztaskin et al.
5560022September 1996Dunstan et al.
5564024October 1996Pemberton
5566299October 1996Billings et al.
5566339October 1996Perholtz et al.
5568610October 1996Brown
5568619October 1996Blackledge et al.
5572403November 1996Mills
5577205November 1996Hwang et al.
5579487November 1996Meyerson et al.
5579491November 1996Jeffries et al.
5579528November 1996Register
5581712December 1996Herrman
5581714December 1996Amini et al.
5584030December 1996Husak et al.
5586250December 1996Carbonneau et al.
5588121December 1996Reddin et al.
5588144December 1996Inoue et al.
5592611January 1997Midgely et al.
5596711January 1997Burckhartt et al.
5598407January 1997Bud et al.
5602758February 1997Lincoln et al.
5604873February 1997Fite et al.
5606672February 1997Wade
5608876March 1997Cohen et al.
5615207March 1997Gephardt et al.
5621159April 1997Brown et al.
5622221April 1997Genga, Jr. et al.
5625238April 1997Ady et al.
5627962May 1997Goodrum et al.
5628028May 1997Michelson
5630076May 1997Saulpaugh et al.
5631847May 1997Kikinis
5632021May 1997Jennings et al.
5636341June 1997Matsushita et al.
5638289June 1997Yamada et al.
5644470July 1997Benedict et al.
5644731July 1997Liencres et al.
5651006July 1997Fujino et al.
5652832July 1997Kane et al.
5652839July 1997Giorgio et al.
5652892July 1997Ugajin
5652908July 1997Douglas et al.
5655081August 1997Bonnell et al.
5655083August 1997Bagley
5655148August 1997Richman et al.
5659682August 1997Devarakonda et al.
5664118September 1997Nishigaki et al.
5664119September 1997Jeffries et al.
5666538September 1997DeNicola
5668943September 1997Attanasio et al.
5668992September 1997Hammer et al.
5669009September 1997Buktenica et al.
5671371September 1997Kondo et al.
5675723October 1997Ekrot et al.
5680288October 1997Carey et al.
5684671November 1997Hobbs et al.
5689637November 1997Johnson et al.
5696895December 1997Hemphill et al.
5696899December 1997Kalwitz
5696949December 1997Young
5696970December 1997Sandage et al.
5701417December 1997Lewis et al.
5704031December 1997Mikami et al.
5708775January 1998Nakamura
5708776January 1998Kikinis
5712754January 1998Sides et al.
5715456February 1998Bennett et al.
5717570February 1998Kikinis
5721935February 1998DeSchepper et al.
5724529March 1998Smith et al.
5726506March 1998Wood
5727207March 1998Gates et al.
5732266March 1998Moore et al.
5737708April 1998Grob et al.
5740378April 1998Rehl et al.
5742514April 1998Bonola
5742833April 1998Dea et al.
5747889May 1998Raynham et al.
5748426May 1998Bedingfield et al.
5752164May 1998Jones
5754797May 1998Takahashi
5758165May 1998Shuff
5758352May 1998Reynolds et al.
5761033June 1998Wilhelm
5761045June 1998Olson et al.
5761085June 1998Giorgio
5761462June 1998Neal et al.
5761707June 1998Aiken et al.
5764924June 1998Hong
5764968June 1998Ninomiya
5765008June 1998Desai et al.
5765198June 1998McCrocklin et al.
5765542June 1998Enstrom et al.
5767844June 1998Stoye
5768541June 1998Pan-Ratzlaff
5771343June 1998Hafner et al.
5774645June 1998Beaujard et al.
5774741June 1998Choi
5777897July 1998Giorgio
5778197July 1998Dunham
5781703July 1998Desai et al.
5781716July 1998Hemphill et al.
5781744July 1998Johnson et al.
5781767July 1998Inoue et al.
5781798July 1998Beatty et al.
5784555July 1998Stone
5784576July 1998Guthrie et al.
5787019July 1998Knight et al.
5787459July 1998Stallmo et al.
5787491July 1998Merkin et al.
5790775August 1998Marks et al.
5790831August 1998Lin et al.
5793948August 1998Asahi et al.
5793987August 1998Quackenbush et al.
5793992August 1998Steele et al.
5794035August 1998Golub et al.
5796185August 1998Takata et al.
5796580August 1998Komatsu et al.
5796981August 1998Abudayyeh et al.
5797023August 1998Berman et al.
5798828August 1998Thomas et al.
5799036August 1998Staples
5799196August 1998Flannery
5801921September 1998Miller
5802269September 1998Poisner et al.
5802298September 1998Imai et al.
5802305September 1998McKaughan et al.
5802324September 1998Wunderlich et al.
5802393September 1998Begun et al.
5802552September 1998Fandrich et al.
5802592September 1998Chess et al.
5803357September 1998Lakin
5805804September 1998Laursen et al.
5805834September 1998McKinley et al.
5809224September 1998Schultz et al.
5809256September 1998Najemy
5809287September 1998Stupek, Jr. et al.
5809311September 1998Jones
5809555September 1998Hobson
5812748September 1998Ohran et al.
5812750September 1998Dev et al.
5812757September 1998Okamoto et al.
5812858September 1998Nookala et al.
5815117September 1998Kolanek
5815647September 1998Buckland et al.
5815652September 1998Ote et al.
5821596October 1998Miu et al.
5822547October 1998Boesch et al.
5826043October 1998Smith et al.
5835719November 1998Gibson et al.
5835738November 1998Blackedge, Jr. et al.
5838932November 1998Alzien
5838935November 1998Davis et al.
5841964November 1998Yamaguchi
5841991November 1998Russell
5845061December 1998Miyamoto et al.
5845095December 1998Reed et al.
5850546December 1998Kim
5852720December 1998Gready et al.
5852724December 1998Glenn, II et al.
5857074January 1999Johnson
5857102January 1999McChesney et al.
5864653January 1999Tavallaei et al.
5864713January 1999Terry
5867730February 1999Leyda
5875307February 1999Ma et al.
5875308February 1999Egan et al.
5875310February 1999Buckland et al.
5878237March 1999Olarig
5878238March 1999Gan et al.
5881311March 1999Woods
5884027March 1999Garbus et al.
5884049March 1999Atkinson
5886424March 1999Kim
5889965March 1999Wallach et al.
5892898April 1999Fujii et al.
5892928April 1999Wallach et al.
5898846April 1999Kelly
5898888April 1999Guthrie et al.
5905867May 1999Giorgio
5907672May 1999Matze et al.
5909568June 1999Nason
5911779June 1999Stallmo et al.
5913034June 1999Malcolm
5918057June 1999Chou et al.
5922060July 1999Goodrum
5930358July 1999Rao
5935262August 1999Barrett et al.
5936960August 1999Stewart
5938751August 1999Tavallaei et al.
5941996August 1999Smith et al.
5964855October 1999Bass et al.
5983349November 1999Kodama et al.
5987554November 1999Liu et al.
5987627November 1999Rawlings, III
6012130January 2000Beyda et al.
Foreign Patent Documents
0 866 403 A1Sep., 1998EP
04 333 118Nov., 1992JP
05 233 110Sep., 1993JP
07 093 064Apr., 1995JP
07 261 874Oct., 1995JP
Other References
Netframe Systems Incorporated, News Release, 3 pages, referring to May 9, 1994, "Netframe's New High-Availability ClusterServer Systems Avoid Scheduled as well as Unscheduled Downtime". .
Netframe Systems Incorporated, datasheet, 2 pages, Feb. 1996, "NF450FT Network Mainframe". .
Netframe Systems Incorporated, datasheet, 9 pages, Mar. 1996, "Netframe Cluster Server 8000". .
Herr, et al., Linear Technology Magazine, Design Features, pp. 21-23, Jun. 1997, "Hot Swapping the PCI Bus". .
"Detailed Overview of the PC Card Standard", Standards Overview, Sep. 30, 1997, 9 pp. .
Goodrum, "PCI Bus Hot Plug Specification", PCI Sign Membership, Jun. 15, 1997, 29 pp. .
Microsoft Corporation, "Supporting Removable Devices under Windows and Windows NT", Aug. 13, 1997, 4 pp. .
Compaq Computer Corporation, "Plug and Play BIOS Specification", Version 1.0A, May 5, 1994, 56 pp. .
Goble, et al., "Intelligent I/O Architecture", I.sub.2 O Sig, Jun. 1996, 22 pp. .
Netframe Systems Inc., "NF450FT Network Mainframe", 14 pp. .
Shanley and Anderson, PCI System Architecture, Third Edition, Chapters 15 & 16, pp. 297-328, CR 1995. .
PCI Hot-Plug Specification, Preliminary Revision for Review Only, Revision 0.9, pp. i-vi, and 1-25, Mar. 5, 1997. .
SES SCSI-3 Enclosure Services, X3T10/Project 1212-D/Rev 8a, pp. i, iii-x, 1-76, and I-1 (index), Jan. 16, 1997. .
Compaq Computer Corporation, Technology Brief, pp. 1-13, Dec. 1996, "Where Do I Plug the Cable? Solving the Logical--Physical Slot Numbering Problem". .
ftp.cdrom.com/pub/os2/diskutil/, PHDX software, phdx.zip download, Mar. 1995, "Parallel Hard Disk Xfer". .
Cmasters, Usenet post to microsoft.public.windowsnt.setup, Aug. 1997, "Re: FDisk switches". .
Hildebrand, N., Usenet post to comp.msdos.programmer, May 1995, "Re: Structure of disk partition into". .
Lewis, L., Usenet post to alt.msdos.batch, Apr. 1997, "Re: Need help with automating FDisk and Format". .
Netframe, http://www.netframe-support.com/technology/datasheets/data.htm, before Mar. 1997, "Netframe ClusterSystem 9008 Data Sheet". .
Simos, M., Usenet post to comp.os.msdos.misc, Apr. 1997, "Re: Auto FDisk and Format". .
Wood, M. H., Usenet post to comp.os.netware.misc, Aug. 1996, "Re: Workstation duplication method for Win95". .
Lyons, Computer Reseller News, Issue 721, pp. 61-62, Feb. 3, 1997, "ACC Releases Low-Cost Solution for ISPs". .
M2 Communications, M2 Presswire, 2 pages, Dec. 19, 1996, "Novell IntranetWare Supports Hot Pluggable PCI from Netframe". .
Rigney, PC Magazine, 14(17): 375-379, Oct. 10, 1995, "The One for the Road (Mobile-aware capabilities in Windows 95)". .
Shanley, and Anderson, PCI System Architecture, Third Edition, p. 382, Copyright 1995. .
Gorlick, M., Conf. Proceedings: ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 175-181, 1991, "The Flight Recorder: An Architectural Aid for System Monitoring". .
IBM Technical Bulliten, 92A+62947, pp. 391-394, Oct. 1992, Method for Card Hot Plug Detection and Control. .
Davis, T, Usenet post to alt.msdos.programmer, Apr. 1997, "Re: How do I create an FDisk batch file?". .
Davis, T., Usenet post to alt.msdos.batch, Apr. 1997, "Re: Need help with automating FDisk and Format . . . ". .
Netframe Systems Incorporated, Doc. No. 78-1000226-01, pp. 1-2, 5-8, 359-404, and 471-512, Apr. 1996, "Netframe Clustered Multiprocessing Software: NW0496 DC-Rom for Novel.RTM. NetWare.RTM. 4.1 SMP, 4.1, and 3.12". .
Shanley, and Anderson, PCI System Architecture, Third Edition, Chapter 15, pp. 297-302, Copyright 1995, "Intro To Configuration Address Space". .
Shanley, and Anderson, PCI System Architecture, Third Edition, Chapter 16, pp. 303-328, Copyright 1995, "Configuration Transactions". .
Sun Microsystems Computer Company, Part No. 802-5355-10, Rev. A, May 1996, "Solstice SyMON User's Guid". .
Sun Microsystems, Part No. 802-6569-11, Release 1.0.1, Nov. 1996, "Remote Systems Diagnostics Installation & User Guide"..~
Primary Examiner: An; Meng-Ai T.
Assistant Examiner: El-Hady; Nabil
Attorney, Agent or Firm:Knobbe, Martens, Olson & Bear, LLP

Parent Case Text



RELATED APPLICATIONS

This application is related to U.S. application Ser. No. 08/942,309, entitled "HOT ADD OF DEVICES SOFTWARE ARCHITECTURE", U.S. application Ser. No. 08/942,282, entitled "Apparatus For Computer Implemented Hot-Swap And Hot-Add," ; and U.S. application Ser. No. 08/941,970, entitled "Method For Computer Implemented Hot-Swap And Hot-Add", U.S. application Ser. No. 08/942,306 entitled "METHOD FOR THE HOT ADD OF DEVICES", U.S. application Ser. No. 08/942,311, entitled "HOT SWAP OF DEVICES SOFTWARE ARCHITECTURE", U.S. application Ser. No. 08/942,457, entitled "METHOD FOR THE HOT SWAP OF DEVICES", U.S. Pat. No. 5,892,928, entitled "METHOD FOR THE HOT ADD OF A NETWORK ADAPTER ON A SYSTEM INCLUDING A DYNAMICALLY LOADED ADAPTER DRIVER", issued on Apr. 6, 1999, U.S. application Ser. No. 08/942,069, entitled "METHOD FOR THE HOT ADD OF A MASS STORAGE ADAPTER ON A SYSTEM INCLUDING A STATICALLY LOADED ADAPTER DRIVER", U.S. application Ser. No. 08/942,465, entitled "METHOD FOR THE HOT ADD OF A NETWORK ADAPTER ON A SYSTEM INCLUDING A STATICALLY LOADED ADAPTER DRIVER", U.S. application Ser. No. 08/962,963, entitled "METHOD FOR THE HOT ADD OF A MASS STORAGE ADAPTER ON A SYSTEM INCLUDING A DYNAMICALLY LOADED ADAPTER DRIVER", U.S. Pat. No. 5,889,965, entitled "METHOD FOR THE HOT SWAP OF A NETWORK ADAPTER ON A SYSTEM INCLUDING A DYNAMICALLY LOADED ADAPTER DRIVER", issued on Mar. 30, 1999, U.S. application Ser. No. 08/942,336, entitled "METHOD FOR THE HOT SWAP OF A MASS STORAGE ADAPTER ON A SYSTEM INCLUDING A STATICALLY LOADED ADAPTER DRIVER", U.S. application Ser. No. 08/942,459, entitled "METHOD FOR THE HOT SWAP OF A NETWORK ADAPTER ON A SYSTEM INCLUDING A STATICALLY LOADED ADAPTER DRIVER", which are being filed concurrently herewith on Oct. 1, 1997.

Claims


What is claimed is:
1. A method of hot swapping a standard programmable mass storage adapter connected to an operational computer, comprising:
providing a hot plug hardware in the operational computer, the hot plug hardware being configured to enable and disable power to the standard programmable mass storage adapter, wherein the operational computer has at least one programmable data processor for receiving requests from a central processing unit and for controlling the power to an adapter slot in response to requests from the central processing unit, and determining whether a received request is to disable or enable power;
receiving a hot swap request from a user interface program for the hot swap of the standard programmable mass storage adapter, wherein the hot swap request causes suspension of communications and power disablement to the standard programmable mass storage adapter;
receiving a request for the suspension of all input/output (I/O) communications to the standard programmable mass storage adapter;
requesting the operating system to suspend all communications to the standard programmable mass storage adapter;
waiting for the completion of any pending I/O communications to the standard programmable mass storage adapter;
notifying the requester that all I/O is suspended;
disabling power to the standard programmable mass storage adapter, wherein the disabling occurs under the control of the hot plug hardware and wherein the programmable data processor determines whether the received request is to disable power;
removing the standard programmable mass storage adapter;
inserting a new standard programmable mass storage adapter into the operational computer;
enabling power to the standard programmable mass storage adapter, wherein the enabling occurs under the control of the hot plug hardware and wherein the programmable data processor determines whether the received request is to enable power;
programming the new standard programmable mass storage adapter to have at least a portion of the configuration information associated with the configuration information as the removed standard programmable mass storage adapter; and
restarting communications between the operational computer and the new standard programmable mass storage adapter.

2. The method of claim 1, additionally comprising storing configuration information that is associated with the standard programmable mass storage device.

3. The method of claim 1, wherein waiting for the completion of any pending I/O communications to the standard programmable mass storage adapter includes:
(i) using a counter to define a waiting period;
(ii) requesting the number of the pending I/O communications from an operating system; and
(iii) decrementing the counter, responsive to the requesting.

4. The method of claim 3, wherein waiting for the completion of any pending I/O communications to the standard programmable mass storage adapter additionally includes:
receiving the number of pending I/O communications from the operating system; and
ending the waiting period when the number of pending I/O communications is equal to zero.

5. The method of claim 3, wherein waiting for the completion of any pending I/O communications to the standard programmable mass storage adapter additionally includes:
receiving from the operating system the number of pending I/O communications; and
repeating steps (i), (ii), and (iii) if the number of pending I/O communications is greater than zero.

6. The method of claim 1, wherein restarting communications includes sending an input/output packet to a filter custom device module that is associated with the new programmable mass storage adapter.

7. A method of hot swapping a standard mass storage adapter connected to an operational computer, comprising:
providing a hot plug hardware in the operational computer, the hot plug hardware being configured to enable and disable power to the standard mass storage adapter, wherein the operational computer has at least one programmable data processor for receiving requests from a central processing unit and for controlling the power to the adapter in response to requests from the central processing unit, and determining whether a received request is to disable or enable power;
receiving a hot swap request from a user interface program for the hot swap of the standard mass storage adapter, wherein the hot swap request causes suspension of communications and power disablement to the standard mass storage adapter;
receiving a request for the suspension of all I/O communications to the standard mass storage adapter;
requesting the operating system to suspend all communications to the standard mass storage adapter;
waiting for the completion of any pending I/O communications to the standard mass storage adapter;
notifying the requester that all I/O is suspended;
disabling power to the standard mass storage adapter, wherein the disabling occurs under the control of the hot plug hardware and wherein the programmable data processor determines whether the received request is to disable power;
removing the standard mass storage adapter from the operational computer;
inserting a new standard mass storage adapter into the operational computer at the same location formerly occupied by the mass storage adapter;
enabling power to the new standard mass storage adapter, wherein the enabling occurs under the control of and within the operational computer and wherein the programmable data processor determines whether the received request is to enable power;
programming the new standard mass storage adapter to have at least a portion of the configuration information associated with the configuration information as the removed standard mass storage adapter; and
restarting communications between the operational computer and the new standard mass storage adapter.

8. The method of claim 7, additionally comprising storing configuration information that is associated with the standard mass storage device.

9. The method of claim 7, wherein waiting for the completion of any pending I/O communications to the standard mass storage adapter includes:
(i) using a counter to define a waiting period;
(ii) requesting the number of the pending I/O communications from an operating system; and
(iii) decrementing the counter, responsive to the requesting.

10. The method of claim 9, wherein waiting for the completion of any pending I/O communications to the standard mass storage adapter additionally includes:
receiving the number of pending I/O communications from the operating system; and
ending the waiting period when the number of pending I/O communications is equal to zero.

11. The method of claim 9, wherein waiting for the completion of any pending I/O communications to the standard mass storage adapter additionally includes:
receiving the number of pending I/O communications from the operating system; and
repeating steps (i), (ii), and (iii) if the number of pending I/O communications is greater than zero.

12. The method of claim 7, wherein restarting communications includes sending an input/output packet to a filter custom device module that is associated with the new mass storage adapter.

13. A method of hot swapping a standard mass storage adapter connected to an operational computer including at least one canister, wherein the canister connects to one or more existing adapters, comprising:
providing a hot plug hardware in the operational computer, the hot plug hardware being configured to enable and disable power to the standard mass storage adapter, wherein the operational computer has at least one programmable data processor for receiving requests from a central processing unit and for controlling the power to the adapter in response to requests from the central processing unit, and determining whether a received request is to disable or enable power;
receiving a hot swap request from a user interface program for the hot swap of the standard mass storage adapter, wherein the hot swap request causes suspension of communications and power disablement to the standard mass storage adapter;
receiving a request for the suspension of all input/output (I/O) communications to one or more adapters that are connected to one of the canisters that contains the standard mass storage adapter;
requesting the operating system to suspend all communications to the adapters;
waiting for the completion of any pending I/O communications to the adapters;
notifying the requester that all I/O is suspended;
disabling power to the selected canister with the standard adapters, while maintaining power to the computer and other adapters, wherein the disabling occurs under the control of the hot plug hardware and wherein the programmable data processor determines whether the received request is to disable power;
disconnecting the selected canister from the operational computer;
removing a selected one of the standard mass storage adapters from the canister;
adding a new standard mass storage adapter in the canister at the same location formerly occupied by the selected mass storage adapter;
connecting the selected canister to the operational computer;
enabling power to the adapters in the canister, wherein the enabling occurs under the control of and within the operational computer and wherein the programmable data processor determines whether the received request is to enable power;
restarting communications to the adapters; and
restarting communications between the operable computer and the new standard mass storage adapter.

14. The method of claim 13, additionally comprising storing configuration information that is associated with the standard programmable mass storage device.

15. The method of claim 13, wherein waiting for the completion of any pending I/O communications to the adapters includes:
(i) using a counter to defile a waiting period;
(ii) requesting the number of the pending I/O communications from an operating system; and
(iii) decrementing the counter, responsive to the requesting.

16. The method of claim 15, wherein waiting for the completion of any pending I/O communications to the adapters additionally includes:
receiving the number of pending I/O communications from the operating system; and
ending the waiting period when the number of pending I/O communications is equal to zero.

17. The method of claim 15, wherein waiting for the completion of any pending I/O communications to the adapters additionally includes:
receiving the number of pending I/O communications from the operating system; and
repeating steps (i), (ii), and (iii) if the number of pending I/O communications is greater than zero.

18. The method of claim 13, wherein restarting communications includes sending an input/output packet to a filter custom device module that is associated with the new programmable mass storage adapter.

Description

INCORPORATION BY REFERENCE OF COMMONLY OWNED APPLICATIONS

The following patent applications, commonly owned and filed on the same day as the present application, are hereby incorporated herein in their entirety by reference thereto:

Title application Ser. No. Attorney Docket No. "System Architecture for Remote Access and Control 08/942,160 MNFRAME.002A1 of Environmental Management" "Method of Remote Access and Control of 08/942,215 MNFRAME.002A2 Environmental Management" "System for Independent Powering of Diagnostic 08/942,410 MNFRAME.002A3 Processes on a Computer System" "Method of Independent Powering of Diagnostic 08/942,320 MNFRAME.002A4 Processes on a Computer System" "Diagnostic and Managing Distributed Processor 08/942,402 MNFRAME.005A1 System" "Method for Managing a Distributed Processor 08/942,448 MNFRAME.005A2 System" "System for Mapping Environmental Resources to 08/942,222 MNFRAME.005A3 Memory for Program Access" "Method for Mapping Environmental Resources to 08/942,214 MNFRAME.005A4 Memory for Program Access" "Hot Add of Devices Software Architecture" 08/942,309 MNFRAME.006A1 "Method for The Hot Add of Devices" 08/942,306 MNFRAME.006A2 "Hot Swap of Devices Software Architecture" 08/942,311 MNFRAME.006A3 "Method for The Hot Swap of Devices" 08/942,457 MNFRAME.006A4 "Method for the Hot Add of a Network Adapter on a 08/943,072 MNFRAME.006A5 System Including a Dynamically Loaded Adapter Driver" New U.S. Pat. No.
5892928 issued April 6, 1999. "Method for the Hot Add of a Mass Storage Adapter 08/942,069 MNFRAME.006A6 on a System Including a Statically Loaded Adapter Driver" "Method for the Hot Add of a Network Adapter on a 08/942,465 MNFRAME.006A7 System Including a Statically Loaded Adapter Driver" "Method for the Hot Add of a Mass Storage Adapter 08/962,963 MNFRAME.006A8 on a System Including a Dynamically Loaded Adapter Driver" "Method for the Hot Swap of a Network Adapter on a 08/943,078 MNFRAME.006A9 System Including a Dynamically Loaded Adapter Driver" New U.S. Pat. No. 5889965, issued March 3, 1999. "Method for the Hot Swap of a Mass Storage Adapter 08/942,336 MNFRAME.006A10 on a System Including a Statically Loaded Adapter Driver" "Method for the Hot Swap of a Network Adapter on a 08/942,459 MNFRAME.006A11 System Including a Statically Loaded Adapter Driver" "Method for the Hot Swap of a Mass Storage Adapter 08/942,458 MNFRAME.006A12 on a System Including a Dynamically Loaded Adapter Driver" "Method of Performing an Extensive Diagnostic Test 08/942,463 MNFRAME.008A in Conjunction with a BIOS Test Routine" "Apparatus for Performing an Extensive Diagnostic 08/942,163 MNFRAME.009A Test in Conjunction with a BIOS Test Routine" "Configuration Management Method for Hot Adding 08/941,268 MNFRAME.010A and Hot Replacing Devices" "Configuration Management System for Hot Adding 08/942,408 MNFRAME.011A and Hot Replacing Devices" "Apparatus for Interfacing Buses" 08/942,382 MNFRAME.012A "Method for Interfacing Buses" 08/942,413 MNFRAME.013A "Computer Fan Speed Control Device" 08/942,447 MNFRAME.016A "Computer Fan Speed Control Method" 08/942,216 MNFRAME.017A "System for Powering Up and Powering Down a 08/943,076 MNFRAME.018A Server" "Method of Powering Up and Powering Down a 08/943,077 MNFRAME.019A Server" "System for Resetting a Server" 08/942,333 MNFRAME.020A "Method of Resetting a Server" 08/942,405 MNFRAME.021A "System for Displaying Flight Recorder" 08/942,070 MNFRAME.022A "Method of Displaying Flight Recorder" 08/942,068 MNFRAME.023A "Synchronous Communication Interface" 08/943,355 MNFRAME.024A "Synchronous Communication Emulation" 08/942,004 MNFRAME.025A "Software System Facilitating the Replacement or 08/942,317 MNFRAME.026A Insertion of Devices in a Computer System" "Method for Facilitating the Replacement or Insertion 08/942,316 MNFRAME.027A of Devices in a Computer System" "System Management Graphical User Interface" 08/943,357 MNFRAME.028A "Display of System Information" 08/942,195 MNFRAME.029A "Data Management System Supporting Hot Plug 08/942,129 MNFRAME.030A Operations on a Computer" "Data Management Method Supporting Hot Plug 08/942,124
MNFRAME.031A Operations on a Computer" "Alert Configurator and Manager" 08/942,005 MNFRAME.032A "Managing Computer System Alerts" 08/943,356 MNFRAME.033A "Computer Fan Speed Control System" 08/940,301 MNFRAME.034A "Computer Fan Speed Control System Method" 08/941,267 MNFRAME.035A "Black Box Recorder for Information System Events" 08/942,381 MNFRAME.036A "Method of Recording Information System Events" 08/942,164 MNFRAME.037A "Method for Automatically Reporting a System 08/942,168 MNFRAME.040A Failure in a Server" "System for Automatically Reporting a System 08/942,384 MNFRAME.041A Failure in a Server" "Expansion of PCI Bus Loading Capacity" 08/942,404 MNFRAME.042A "Method for Expanding PCI Bus Loading Capacity" 08/942,223 MNFRAME.043A "System for Displaying System Status" 08/942,347 MNFRAME.044A "Method of Displaying System Status" 08/942,071 MNFRAME.045A "Fault Tolerant Computer System" 08/942,194 MNFRAME.046A "Method for Hot Swapping of Network Components" 08/943,044 MNFRAME.047A "A Method for Communicating a Software Generated 08/942,221 MNFRAME.048A Pulse Waveform Between Two Servers in a Network" "A System for Communicating a Software Generated 08/942,409 MNFRAME.049A Pulse Waveform Between Two Servers in a Network" "Method for Clustering Software Applications" 08/942,318 MNFRAME.050A "System for Clustering Software Applications" 08/942,411 MNFRAME.051A "Method for Automatically Configuring a Server after 08/942,319 MNFRAME.052A Hot Add of a Device" "System for Automatically Configuring a Server after 08/942,331 MNFRAME.053A Hot Add of a Device" "Method of Automatically Configuring and 08/942,412 MNFRAME.054A Formatting a Computer System and Installing Software" "System for Automatically Configuring and 08/941,955 MNFRAME.055A Formatting a Computer System and Installing Software" "Determining Slot Numbers in a Computer" 08/942,462 MNFRAME.056A "System for Detecting Errors in a Network" 08/942,169 MNFRAME.058A "Method of Detecting Errors in a Network" 08/940,302 MNFRAME.059A "System for Detecting Network Errors" 08/942,407 MNFRAME.060A "Method of Detecting Network Errors" 08/942,573 MNFRAME.061A

U.S. application Ser. No. 08/942,282, entitled "Apparatus For Computer Implemented Hot-Swap And Hot-Add," and U.S. application Ser. No. 08/941,970, entitled "Method For Computer Implemented Hot-Swap And Hot-Add".

PRIORITY CLAIM

The benefit under 35 U.S.C. .sctn.119(e) of the following U.S. provisional application(s) is hereby claimed:

Title application Ser. No. Filing Date "Hardware and Software Architecture for 60/047,016 May 13, 1997 Inter-Connecting an Environmental Management System with a Remote Interface" "Self Management Protocol for a Fly-By-Wire 60/046,416 May
13, 1997 Service Processor" "Hot Plug Software Architecture for Off the 60/046,311 May 13, 1997 Shelf Operating Systems" "Computer System Hardware Infrastructure for 60/046,398 May 13, 1997 Hot Plugging Single and Multi-Function PCI Cards Without Embedded Bridges" "Computer System Hardware Infrastructure for 60/046,312 May 13, 1997 Hot Plugging Multi-Function PCI Cards With Embedded Bridges"

COPYRIGHT RIGHTS

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION

FIELD OF THE INVENTION

The field of the invention relates to I/O adapters in computer systems. More particularly, the field of invention relates to the hot add and swap of adapters on a computer system.

DESCRIPTION OF THE RELATED TECHNOLOGY

As enterprise-class servers, which are central computers in a network that manage common data, become more powerful and more capable, they are also becoming ever more sophisticated and complex. For many companies, these changes lead to concerns over server reliability and manageability, particularly in light of the increasingly critical role of server-based applications. While in the past many systems administrators were comfortable with all of the various components that made up a standards-based network server, today's generation of servers can appear as an incomprehensible, unmanageable black box. Without visibility into the underlying behavior of the system, the administrator must "fly blind." Too often, the only indicators the network manager has on the relative health of a particular server is whether or not it is running.

It is well-acknowledged that there is a lack of reliability and availability of most standards-based servers. Server downtime, resulting either from hardware or software faults or from regular maintenance, continues to be a significant problem. By one estimate, the cost of downtime in mission critical environments has risen to an annual total of $4.0 billion for U.S. businesses, with the average downtime event resulting in a $140 thousand loss in the retail industry and a $450 thousand loss in the securities industry. It has been reported that companies lose as much as $250 thousand in employee productivity for every 1% of computer downtime. With emerging Internet, intranet and collaborative applications taking on more essential business roles every day, the cost of network server downtime will continue to spiral upward.

A significant component of cost is hiring administration personnel. These costs decline dramatically when computer systems can be managed using a common set of tools, and where they don't require immediate attention when a failure occurs. Where a computer system can continue to operate even when components fail, and defer repair until a later time, administration costs become more manageable and predictable.

While hardware fault tolerance is an important element of an overall high availability architecture, it is only one piece of the puzzle. Studies show that a significant percentage of network server downtime is caused by transient faults in the I/O subsystem. These faults may be due, for example, to the device driver, the device firmware, or hardware which does not properly handle concurrent errors, and often causes servers to crash or hang. The result is hours of downtime per failure, while a system administrator discovers the failure, takes some action, and manually reboots the server. In many cases, data volumes on hard disk drives become corrupt and must be repaired when the volume is mounted. A dismount-and-mount cycle may result from the lack of "hot pluggability" or "hot plug" in current standards-based servers. Hot plug refers to the addition and swapping of peripheral adapters to an operational computer system. Diagnosing intermittent errors can be a frustrating and time-consuming process. For a system to deliver consistently high availability, it must be resilient to these types of faults.

Existing systems also do not have an interface to control the changing or addition of an adapter. Since any user on a network could be using a particular adapter on the server, system administrators need a software application that will control the flow of communications to an adapter before, during, and after a hot plug operation on an adapter.

Current operating systems do not by themselves provide the support users need to hot add and swap an adapter. System users need software that will freeze and resume the communications of their adapters in a controlled fashion. The software needs to support the hot add of various peripheral adapters such as mass storage and network adapters. Additionally, the software should support adapters that are designed for various bus systems such as Peripheral Component Interconnect, CardBus, Microchannel, Industrial Standard Architecture (ISA), and Extended ISA (EISA). System users also need software to support the hot add and swap of canisters and multi-function adapter cards, which are plug-in cards having more than one adapter.

In a typical PC-based server, upon the failure of an adapter, which is a printed circuit board containing microchips, the server must be powered down, the new adapter and adapter driver installed, the server powered back up and the operating system reconfigured.

However, various entities have tried to implement the hot plug of these adapters to a fault tolerant computer system. One significant difficulty in designing a hot plug system is protecting the circuitry contained on the adapter from being short-circuited when an adapter is added to a powered system. Typically, an adapter contains edge connectors which are located on one side of the printed circuit board. These edge connectors allow power to transfer from the system bus to the adapter, as well as supplying data paths between the bus and the adapter. These edge connectors fit into a slot on the bus on the computer system. A traditional hardware solution for "hot plug" systems includes increasing the length of at least one ground contact of the adapter, so that the ground contact on the edge connector is the first connector to contact the bus on insertion of the I/O adapter and the last connector to contact the bus on removal of the adapter. An example of such a solution is described in U.S. Pat. No. 5,210,855 to Thomas M. Bartol.

U.S. Pat. No. 5,579,491 to Jeffries discloses an alternative solution to the hot installation of I/O adapters. Here, each hotly installable adapter is configured with a user actuable initiator to request the hot removal of an adapter. The I/O adapter is first physically connected to a bus on the computer system. Subsequent to such connection a user toggles a switch on the I/O adapter which sends a signal to the bus controller. The signal indicates to the bus controller that the user has added an I/O adapter. The bus controller then alerts the user through a light emitting diode (LED) whether the adapter can be installed on the bus.

However, the invention disclosed in the Jeffries patent also contains several limitations. It requires the physical modification of the adapter to be hotly installed. Another limitation is that the Jeffries patent does not teach the hot addition of new adapter controllers or bus systems. Moreover, the Jeffries patent requires that before an I/O adapter is removed, another I/O adapter must either be free and spare or free and redundant. Therefore, if there was no free adapter, hot removal of an adapter is impossible until the user added another adapter to the computer system.

A related technology, not to be confused with hot plug systems, is Plug and Play defined by Microsoft and PC product vendors. Plug and Play is an architecture that facilitates the integration of PC hardware adapters to systems. Plug and Play adapters are able to identify themselves to the computer system after the user installs the adapter on the bus. Plug and Play adapters are also able to identify the hardware resources that they need for operation. Once this information is supplied to the operating system, the operating system can load the adapter drivers for the adapter that the user had added while the system was in a non-powered state. Plug and Play is used by both Windows 95 and Windows NT to configure adapter cards at boot-time. Plug and Play is also used by Windows 95 to configure devices in a docking station when a hot notebook computer is inserted into or removed from a docking station.

Therefore, a need exists for improvements in server management which will result in continuous operation despite adapter failures. System users must be able to replace failed components, upgrade outdated components, and add new functionality, such as new network interfaces, disk interface adapters and storage, without impacting existing users. Additionally, system users need a process to hot add their legacy adapters, without purchasing new adapters that are specifically designed for hot plug. As system demands grow, organizations must frequently expand, or scale, their computing infrastructure, adding new processing power, memory, mass storage and network adapters. With demand for 24-hour access to critical, server-based information resources, planned system downtime for system service or expansion has become unacceptable.

SUMMARY OF THE INVENTION

Embodiments of the inventive software architecture allows users to replace failed components, upgrade outdated components, and add new functionality, such as new network interfaces, disk interface adapters and storage, without impacting existing users. The software architecture supports the hot add and swap of off-the-shelf adapters, including those adapters that are programmable.

One embodiment of the invention includes a method of hot swapping an adapter connected to an operational computer, comprising: receiving a request for the suspension of all I/O communications to an existing programmable mass storage adapter, requesting the operating system to suspend all communications to the existing programmable mass storage adapter, waiting for the completion of any pending I/O communications to the existing programmable mass storage adapter, notifying the requester that all I/O is suspended, removing the programmable mass storage adapter, inserting a new programmable mass storage adapter into the computer and restarting communications between the computer and the new programmable mass storage adapter.

Another embodiment of the invention includes a method of hot swapping a mass storage adapter connected to an operational computer, comprising: receiving a request for the suspension of all I/O communications to the existing mass storage adapter, requesting the operating system to suspend all communications to the existing mass storage adapter, waiting for the completion of any pending I/O communications to the existing mass storage adapter, notifying the requester that all I/O is suspended, disabling power to the existing mass storage adapter, removing the existing mass storage adapter from the computer, inserting a new mass storage adapter into the computer at the same location as the mass storage adapter, enabling power to the new mass storage adapter and restarting communications between the computer and the new mass storage adapter.

Yet another embodiment of the invention includes a method of hot swapping an adapter connected to an operational computer including at least one canister, wherein the canister connects to one or more existing adapters, comprising: receiving a request for the suspension of all I/O communications to the existing adapters, requesting the operating system to suspend all communications to the existing adapters, waiting for the completion of any pending I/O communications to the existing adapters, notifying the requester that all I/O is suspended, disabling power to the selected canister with the existing adapters, while maintaining power to the computer and other adapters, removing a selected mass storage adapter from the canister, adding a new mass storage adapter in the canister at the same location as the selected mass storage adapter, enabling power to the adapters in the canister, restarting communications to the existing adapters and restarting communications between the computer and the new mass storage adapter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a top-level block diagram showing a fault tolerant computer system of one embodiment of the present invention, including a mass storage adapter and a network adapter.

FIG. 2 is a block diagram showing a first embodiment of a multiple bus configuration connecting I/O adapters and a network of microcontrollers to the clustered CPUs of the fault tolerant computer system, shown in FIG. 1.

FIG. 3 is a block diagram showing a second embodiment of a multiple bus configuration connecting canisters containing I/O adapters and a network of microcontrollers to the clustered CPUs of the fault tolerant computer system, shown in FIG. 1.

FIG. 4 is a block diagram illustrating a portion of the fault tolerant computer system, shown in FIG. 1.

FIG. 5 is a block diagram illustrating certain device driver components of the NetWare Operating System and one embodiment of a configuration manager which reside on the fault tolerant computer system, shown in FIG. 1.

FIG. 6 is one embodiment of a flowchart illustrating the process by which a user performs a hot add of an adapter in the fault tolerant computer system, shown in FIG. 2.

FIG. 7 is one embodiment of a flowchart showing the process by which a user performs a hot add of an adapter on a canister on a fault tolerant computer system, shown in FIG. 3.

FIG. 8 is one embodiment of a flowchart showing the process by which a user performs a hot swap of an adapter on a fault tolerant computer system, shown in FIGS. 2 and 3.

FIGS. 9, 9A and 9B are flowcharts showing one process by which the configuration manager may suspend and restart I/O for hot swapping network adapters under the NetWare Operating System, shown in FIG. 8.

FIGS. 10A, 10B and 10C are flowcharts showing one process by which the configuration manager may suspend and restart I/O for mass hot swapping storage adapters under the NetWare Operating System, show in FIG. 8.

FIG. 11 is a block diagram illustrating a portion of the Windows NT Operating System and a configuration manager which both reside on the fault tolerant computer system, shown in FIGS. 2 and 3.

FIG. 12 is one embodiment of a flowchart showing the process by which the Windows NT Operating System initializes the adapter (miniport) drivers shown in FIG. 11 at boot time.

FIG. 13 is a flowchart illustrating one embodiment of a process by which a loaded adapter driver of FIG. 12 initializes itself with the configuration manager under the Windows NT Operating System.

FIG. 14 is one embodiment of a flowchart showing the process by which the configuration manager handles a request to perform the hot add of an adapter under the Window NT Operating System, shown in FIG. 11.

FIG. 15 is one embodiment of a flowchart showing the process by which an adapter driver locates and initializes a mass storage adapter under the Windows NT Operating Stem in the hot add process shown in FIG. 14.

FIG. 16 is one embodiment of a flowchart showing the process by which the FindAdapter( ) routine initializes an adapter during the hot add locate and initialize process of FIG. 15.

FIG. 17 is one embodiment of a flowchart showing the process by which the configuration manager suspends and resumes the state of an adapter under the Windows NT Operating System during the hot swap shown in FIG. 8.

DETAILED DESCRIPTION OF THE INVENTION

The following detailed description presents a description of certain specific embodiments of the present invention. However, the present invention can be embodied in a multitude of different ways as defined and covered by the claims. In this description, reference is made to the drawings wherein like parts are designated with like numerals throughout.

FIG. 1 is a block diagram showing one embodiment of a fault tolerant computer system. Typically the computer system is one server in a network of servers and is communicating with client computers. Such a configuration of computers is often referred to as a client-server architecture. A fault tolerant server is useful for mission critical applications such as the securities business where any computer down time can result in catastrophic financial consequences. A fault tolerant computer will allow for a fault to be isolated and not propagate through the system thus providing complete or minimal disruption to continuing operation. Fault tolerant systems also provide redundant components, such as adapters, so service can continue even when one component fails.

The system includes a fault tolerant computer system 100 connecting to a mass storage adapter 102 and a network adapter 104 such as for use in a Local Area Network (LAN). The mass storage adapter 102 may contain one or more of various types of device controllers: a magnetic disk controller 108 for magnetic disks 110, an optical disk controller 112 for optical disks 114, a magnetic tape controller 116 for magnetic tapes 118, a printer controller 120 for various printers 122, and any other type of controller 124 for other devices 126. For such multi-function adapters, the controllers may be connected by a bus 106 such as a PCI bus. The peripheral devices communicate and are connected to each controller, by a mass storage bus. In one embodiment, the bus may be a Small Computer System Interface (SCSI) bus. In a typical server configuration there is more than one mass storage adapter connected to the computer 100. Adapters and I/O devices are off-the-shelf products. For instance, sample vendors for a magnetic disk controller 108 and magnetic disks 110 include Qlogic, Intel, and Adaptec. Each magnetic hard disk may hold multiple Gigabytes of data.

The network adapter 104 typically includes a network controller 128. The network adapter 104, which is sometimes referred to as a network interface card (NIC), allows digital communication between the fault tolerant computer system 100 and other computers (not shown) such as a network of servers via a connection 130. In certain configurations there may be more than one network controller adapter connected to the computer 100. For LAN embodiments of the network adapter, the protocol used may be, for example, Ethernet (IEEE 802.3), Token Ring (IEEE 802.5), Fiber Distributed Datalink Interface (FDDI), Asynchronous Transfer Mode (ATM) or any other conventional protocol. Typically, the mass storage adapter 102 and the network adapter 104 are connected to the computer using a standards-based bus system. In different embodiments of the present invention, the standards based bus system could be Peripheral Component Interconnect (PCI), Microchannel, SCSI, Industrial Standard Architecture (ISA) and Extended ISA (EISA) architectures.

FIG. 2 shows one embodiment of the bus structure of the fault tolerant computer system 100. A number `n` of central processing units (CPUs) 200 are connected through a host bus 202 to a memory controller 204, which allows for access to memory by the other system components. In one embodiment, there are four CPUs 200, each being an Intel Pentium Pro microprocessor. However, many other general purpose or special purpose parts and circuits could be used. A number of bridges 206, 208 and 209
connect the host bus to, respectively, three high speed I/O bus systems 212, 214, and 216. The bus systems 212, 214 and 216, referred to as PC buses, may be any standards-based bus system such as PCI, ISA, EISA and Microchannel. In one embodiment of the invention, the bus system 212 is PCI. Alternative embodiments of the invention employ a proprietary bus. An ISA Bridge 218 is connected to the bus system 212 to support legacy devices such as a keyboard, one or more floppy disk drives and a mouse. A network of microcontrollers 225 is also interfaced to the ISA bus 226 to monitor and diagnose the environmental health of the fault tolerant system. A more detailed description of the microcontroller network 225 is contained in the U.S. patent application Ser. No. 08/942,402, "Diagnostic and Managing Distributed Processor System" to Johnson.

A bridge 230 and a bridge 232 connects, respectively, the PC bus 214 with PC bus 234 and the PC bus 216 with the PC bus 236 to provide expansion slots for peripheral devices or adapters. Separating the devices 238 and 240, respectively, on PC buses 234 and 236 reduces the potential that an adapter failure or other transient I/O error affect the entire bus and corrupt data, bring the entire system down or stop the system administrator from communicating with the system. The adapter devices
238 and 240 are electrically and mechanically connected to the PC buses 234 and 236 by PC slots such as slot 241. Hence, an adapter is "plugged" into a slot. In one embodiment of the invention, each slot may be independently powered on and off.

FIG. 3 shows an alternative bus structure embodiment of the fault tolerant computer system 100. The two PC buses 214 and 216 contain a set of bridges 242-248 to a set of PC bus systems 250-256. As with the PC buses 214 and 216, the PC buses
250-256 can be designed according to any type of bus architecture including PCI, ISA, EISA, and Microchannel. The PC buses 250-256 are connected, respectively, to a canister 258, 260, 262 and 264. The canisters 258-264 are casings for a detachable bus system and provide multiple PC slots 266 for adapters. In one embodiment, each canister may be independently powered on and off.

FIG. 4 is a block diagram illustrating hardware and software components of the computer system 100 relating to hot plugging an adapter. A hot plug user interface 302 accepts requests by a user such as a system manager or administrator to perform the hot add or a hot swap of an adapter 310. The user interface 302 preferably communicates through an industry standard operating system 304 such as Windows NT or NetWare, to the hot plug system driver 306 and an adapter driver 308. In an alternative embodiment of the invention, a proprietary operating system may be utilized.

The hot plug system driver 306 controls the adapter driver 308 for a hot plug operation. The hot plug system driver 306 stops and resumes the communications between the adapter 310 and the adapter driver 308. During a hot add or swap of the adapter 310, the hot plug hardware 312 deactivates the power to the PC slots 241 and 266 (FIGS. 2 and 3). One embodiment of the hot plug hardware 312 may include the network of microcontrollers 225 (FIGS. 2 and 3) to carry out this functionality.

The adapter 310 could be any type of peripheral device such as a network adapter, a mass storage adapter, or a sound board. Typically, however, adapters involved in providing service to client computers over a network, such as mass storage, network and communications adapters, would be the primary candidates for hot swapping or adding in a fault tolerant computer system such as the computer system 100 (FIG. 1). The adapter 310 is physically connected to the hot plug hardware by PC slots such as slots 241 and 266 (FIGS. 2 and 3).

FIGS. 6, 7, and 8 illustrate a generic process by which alternative embodiments of the present invention perform the hot add and swap of devices. Some embodiments of the invention use commercial operating systems, such as Macintosh O.S., OS/2, VMS, DOS, Windows 3.1/95/98 or UNIX to support hot add and swap.

In alternative embodiments of the invention, the hot plug system executes on an I/O platform. In a first architectural embodiment of the invention, the I/O platform and its devices plug in as a single adapter card into a slot. In a second architectural embodiment of the invention, the bridge is integrated onto the motherboard, and hot plug adapters plug in behind the bridge. In a third architectural embodiment of the invention, the I/O platform is plugged in as an option to control non-intelligent devices as are recognized by skilled technologists.

In the second architectural embodiment, the I/O platform can be any industry standard I/O board such as, for example, the 1Q80960RP Evaluation Board which is executing the Ix Works operating system by WindRiver Systems, Inc. In the second architectural embodiment, a hardware device module (HDM) or adapter driver executes on the motherboard. The HDM is designed to communicate via messages with any type of operating system executing on the computer. These messages correspond to primitives which allow hot add and hot swap of adapters plugged into the motherboard.

The following sections describe embodiments of the invention operating on the computers shown in FIGS. 2 and 3 under NetWare Operating System and Windows NT. As previously mentioned, FIGS. 6, 7, and 8 illustrate a generic process by which alternative embodiments of the present invention perform the hot add and swap of devices. First, a process for hot add and swap of an adapter under the NetWare Operating System will be described according to the processes shown in FIGS. 6, 7 and 8. Second, a process for hot add and swap of an adapter 310 under the Windows NT Operating System environment will be described according to the processes shown in FIGS. 6, 7, and 8.

Adapter Hot Plug with NetWare Operating System

FIG. 5 is a block diagram illustrating the system components of the NetWare Operating System and an embodiment of the software components of the invention. A configuration manager 500 is responsible for managing all or some of the adapters on the PC buses 234 and 236 (FIG. 2), or 250, 252, 254 and 256 (FIG. 3). The configuration manager 500 keeps track of the configuration information for every managed adapter located on the fault tolerant computer system 100. The configuration manager 500
also allocates resources for every managed adapter and initializes each managed adapter's registers during a hot swap operation. The registers of an adapter 310 are components or intermediate memories whose values issues a certain action in the adapter, or whose values indicate the status of the adapter.

Novell has created two interfaces for adapter drivers to communicate with the NetWare Operating Systems (FIGS. 1 and 4). First, Novell has provided the Open Datalink Interface (ODI) for network drivers. Second, Novell has created the NetWare Peripheral Architecture (NWPA) for mass storage adapters. Each of these interfaces will be described below.

With respect to network device drivers, such as a driver 524, ODI was created to allow multiple LAN adapters, such as the adapter 104 to co-exist on network systems, and to facilitate the task of writing device driver software. The ODI specification describes the set of interface (FIG. 1) and software modules used by hardware vendors to interface with the NetWare operating system. At the core of the ODI is the link support layer (LSL) 502. The LSL 502 is the interface between drivers and protocol stacks (not shown). Any LAN driver written to ODI specifications can communicate with any ODI protocol stack via the LSL 502. A protocol stack is a layered communication architecture, whereby each layer has a well defined interface.

Novell has provided a set of support modules that creates the interface to the LSL 502. These modules are a collection of procedures, macros and structures. These modules are the media support module (MSM) 504 which contains general functions common to all drivers and the topology specific modules (TSM) 506. The TSM 506 provides support for the standardized media types of token ring, Fiber Distributed Datalink Interface (FDDI) and Ethernet. The MSM 504 manages the details of interfacing ODI multi-link interface drivers (MLID) to the LSL 502 and the NetWare Operating System. The MSM 504 typically handles all of the generic initialization and run-time issues common to all drivers. The topology specific module or TSM 506 manages operations that are unique to a specific media type. The Hardware Specific Modules (HSM) are created by each adapter vendor for each type of adapter 308. The HSM 508 contains the functionality to initialize, reset and shutdown the adapter 308. The HSM 508 also handles packet transmission and reception to and from each adapter 308.

With respect to mass storage device drivers, such as a driver 526, the NetWare Peripheral Architecture (NWPA) 510 is a software architecture developed by Novell which provides an interface for mass storage developers to interface with the NetWare operating system. The NWPA 510 is divided into two components: a host adapter module (HAM) 512 and a custom device module (CDM) 513. The HAM 512 is a component that contains information on the host adapter hardware which is typically written by a mass storage adapter vendor. The CDM 513 is the component of the NWPA 510 that regulates the mass storage adapters 102.

The main purpose of the Filter CDM 516 is to locate each HAM 512, register adapter events, and process the I/O suspend and I/O restart requests from the configuration manager 500. These commands will be discussed in greater detail below with reference to FIG. 10.

A NetWare user interface 518 initiates the requests to the configuration manager 500 to freeze and restart communications to a specified adapter 310. A remote Simple Network Management Protocol (SNMP) agent 520 can also start the request to freeze and resume communications to the configuration manager 500 through a local SNMP agent 522. SNMP is one of a set of protocols called TCP/IP, which is specifically designed for use in managing computer systems. In one embodiment of the invention, the computers would be similar to the fault tolerant computer system of FIG. 1 and connected in a server network via connection 130.

FIG. 6 is a flowchart illustrating one embodiment of the process to hot add an adapter 310. For instance, the process shown in FIG. 6 may be utilized by a fault tolerant computer system 100 containing the bus structure shown in FIG. 2. The process described by FIG. 6 is generic to various implementations of the invention. The following description of FIG. 6 focuses on the hot add of an adapter 310 (FIG. 4) under the NetWare Operating System.

Starting in state 600, a user inserts an adapter 310 into one of the PC bus slots, such as the slot 241. At this point, the hot plug hardware 312 has not turned on the power to the adapter's slot, although the fault tolerant computer system 100
is operational. Since the adapter's slot is not powered and is physically isolated from any other devices which are attached to the bus 234, the adapter will not be damaged by a short circuit during the insertion process, and will not create problems for the normal operation of the fault tolerant computer system 100. Moving to state 602, the configuration manager 500 is notified that the adapter is now in the slot, and requests the hot plug hardware 312 to supply power to the adapter's slot. In one embodiment of the invention, the hot plug hardware automatically detects the presence of the newly added adapter 310 and informs the configuration manager 500. In another embodiment of the invention, the user notifies the hot plug hardware 312 that the adapter 310 is connected to one of the PC slots 241. The process by which a slot 241 and adapter 238 are powered on and attached to a shared bus 234 is described in the U.S. application Ser. No. 08/942,402, "Diagnostic and Managing Distributed Processor System" to Johnson.

Once an adapter 310 is added to the computer system, system resources must be allocated for the adapter 310. The configuration manager 500 then configures the newly added adapter 310 (state 604) by writing information to the adapter's configuration space registers.

Traditionally, an adapter's resources are allocated by the Basic Input Output Services (BIOS). The BIOS are service routines which are invoked during the fault tolerant computers system's 100 start up phase. The BIOS programs the I/O ports, or memory locations of each adapter on the fault tolerant computer system 100. However, since any newly added adapter was not present during the execution of the BIOS initialization routines, the configuration manager 500 must configure the new adapter in the same manner that another like adapter is programmed by the BIOS. The process by which the configuration space of an a newly added adapter 310 is configured is described in the U.S. application Ser. No. 08/941,268, "Configuration Management Method for Hot Adding and Hot Replacing Devices" to Mahalingam.

FIG. 7 is a flowchart illustrating the process hot add an adapter 310 on one of the canisters 258-264. The process described by FIG. 7 is generic to multiple embodiments of the invention. For instance, the process shown in FIG. 7 is utilized by a fault tolerant computer system 100 containing the bus structure shown in FIG. 3. The following description of FIG. 7 focuses on the hot add of an adapter 310 on a canister under the NetWare Operating System.

Starting in state 700, all devices already operating in the selected canister are located, and activity involving those adapters is suspended. In one embodiment, the SNMP agent 520 or the NetWare User Interface 518 locates all devices, and initiates the request for the suspension for every adapter, such as the adapter 310, on the canister. The configuration manager 500 suspends the I/O for every adapter that is located on the canister which was selected by the user to receive the new card. In another embodiment, the SNMP agent 520 or the NetWare User Interface 518 requests the configuration manager to suspend the canister. The configuration manager 500 then locates all devices and suspends the I/O for each adapter located on the selected canister.

The configuration manager 500 initiates the suspension of I/O to either the NWPA 510 for the mass storage adapters 102 or the LSL 502 and MSM 504 for the network adapter 104. FIGS. 9 and 10, described below, illustrate in detail the process by which the configuration manager 500 suspends and resumes the I/O to a mass storage adapter and to a network adapter.

For the embodiments of the invention that use PCI, the bus must be quiesced, and power to the canister turned off. In one embodiment, the software must assert the bus reset bit as defined by the PCI specification (state 702). If the power to the canister is on, the hot plug hardware 312 is directed by the configuration manager 500 to disable the power to one of the specified canisters 258-264 (state 704). In another embodiment, the hot plug hardware 312 asserts bus reset, then powers the canister down.

Proceeding to state 706, the user removes the selected canister, e.g., canister 264, and inserts an adapter into one of the PC slots 266. If the card is on a new canister that was not present during boot initialization, the hot plug hardware 312
should support the sparse assignment of bus numbers for those systems that require such functionality. The user then returns the canister to the fault tolerant computer system 100. The hot plug hardware 312 then restarts, at the request of the configuration manager 500, the power to the selected canister (state 708). For PCI systems, the bus reset bit must be de-asserted (state 710). In one embodiment of the invention, this de-assertion is accomplished by the hot plug hardware. In another embodiment, the configuration manager 500 de-asserts the bus reset. The configuration manager 500 re-initializes the configuration space of each adapter that was previously in the system (state 712). Since an adapter has lost power during a hot add, the adapter is in an unknown state after reapplying power. Moving to state 714, the configuration manager 500 programs the configuration space of the new adapter. Finally, the configuration manager 500 resumes operations to all of the adapters located on the canister (state 718). For mass storage adapters 102, the configuration manager 500 notifies the NWPA 510 to resume communications. For network adapters 104, the configuration manager 500 contacts the LSL 502 to resume communications. In some embodiments of the invention, the configuration manager 500 restarts I/O to all adapters in the canister, per such a request, while in other embodiments, the user interface 518 or SNMP agent 520 requests the configuration manger 500 to restart each adapter.

FIG. 8 is a flowchart illustrating the process by which a user performs the hot swap of an adapter. The process described by FIG. 8 is generic to various implementations of the invention. For instance, the process shown in FIG. 8 may be utilized by a fault tolerant computer system 100 shown in FIGS. 2 and 3. The following description of FIG. 8 focuses on the hot swap of an adapter 310 under the NetWare Operating System.

Before starting in state 800, an event has occurred, such as a failure of an adapter, and the operator has been informed of the failure. The operator has procured a replacement part, and is determined to repair the computer system 100 at this time. The operator may have some other reason for deciding to remove and replace a card, such as upgrading to a new version of the card or its firmware. A user indicates his intention to swap an adapter through the NetWare user interface 518 or a remote SNMP agent 520 (FIG. 5).

For the embodiment of the computer shown in FIG. 2, the configuration manager 500 suspends the communication between the adapter, which is to be swapped, and the adapter driver 308 (state 802). For the embodiment of the computer shown in FIG. 3, the configuration manager 500 freezes the communication to each adapter located on the same canister as the adapter to be swapped. FIGS. 9 and 10, described below, illustrate the process by which the communication is suspended and restarted for, respectively, a mass storage adapter and a network adapter.

Next, in some embodiments, the hot plug hardware 318 asserts bus reset, if necessary, before removing power (state 804). In other embodiments, the configuration manager 500 specifically causes bus reset to be asserted before directing the hot plug hardware 318 to remove power. For embodiments of the computer shown in FIG. 2, the hot plug hardware 318 is then directed by the configuration manager 500 to suspend the power to the slot (state 806). For embodiments of the computer shown in FIG.
3, the hot plug hardware 318 is directed by the configuration manager 500 to suspend the power to adapter's canister (state 806).

Proceeding to state 808, for a canister system, the user removes the canister containing the failed card and exchanges an old adapter with a new adapter. The user then reinserts the canister. For a non-canister system, the user swaps the old adapter for the new adapter in the slot.

For canister systems with a PCI bus, at state 810, the hot plug hardware 318 reapplies power to the slot or the canister. For some embodiments, the hot plug hardware 312 also removes bus reset, if necessary, after applying power (state 812). In other embodiments, the configuration manager 500 must specifically de-assert the bus reset. For the embodiment of the computer shown by FIG. 2, the configuration manager 500 reprograms the configuration space of the replaced adapter to the same configuration as the old adapter (state 814). For the embodiment of the computer shown in FIG. 3, the configuration manager 500 reprograms the configuration space and resumes the communication of each adapter located on the canister on which the adapter was swapped (state 814). Finally in state 816 the configuration manager changes each adapter's state to active.

FIGS. 9A and 9B illustrate the process by which the configuration manager 500 suspends and restarts the communication of a network adapter, such as the adapter 104. The configuration manager 500 maintains information about the configuration space for each of the adapters maintained on the system. However, the configuration manager 500 does not know the logical number that the NetWare Operating System has assigned to each adapter. The configuration manager 500 needs the logical number of the adapter to direct the NetWare Operating System to shutdown a particular adapter. FIGS. 9A and 9B illustrate one embodiment of process of how the configuration manager 500 obtains the logical number of an adapter.

Starting in a decision state 900 in FIG. 9A, the configuration manager 500 checks whether the adapter's class is of the type "LAN" (or network). For PCI systems, each adapter maintains information in its PCI configuration space indicating its class. If the configuration manager 500 identifies an adapter as being of the LAN class, the configuration manager 500 proceeds to state 902. Otherwise, the configuration manager performs an alternative routine to handle the request to suspend or restart I/O communications (state 904). For example, if the class of the adapter 310 were of type "SCSI" (or mass storage), the configuration manager 500 would follow the process described in FIG. 10 for freezing the communication for a mass storage adapter 102.

As defined by the PCI specification, the base address registers (BARs) define the starting point of the I/O and memory addresses that each adapter has been allocated in system memory. Also, defined by the PCI specification, an adapter can have up to six BARs. It is up to the adapter vendor to implement one or more BARS in the adapter for I/O or memory addressing, as desired. According to the PCI specification, each of the six BAR entries in an adapter's configuration space is identified as to its resource type (bit zero indicates whether this BAR describes a memory space or I/O space).

The configuration manager 500 reads all of the BARs in the configuration space for each adapter 310, looking for a BAR which describes I/O resources. For each such BAR, the LSL 502 configuration spaces are searched for an I/O port address which matches this BAR. This process continues until a match is found, identifying the LSL 502 configuration space which describes this adapter. If no match is found, then LSL 502 has no logical board describing this adapter, and no driver exists to service this board.

At state 902, the variable "x" is initialized to zero. The xth BAR is examined to see if it is an I/O class address (states 906 and 908). If the BAR is not an I/O address, x is incremented (state 912), and a check is made whether all BARs have been examined (state 914). If all six BARs have now been examined (state 914), a status is returned by the configuration manager 500 indicating "driver not loaded". Otherwise, the configuration manager 500 returns to state 908 to examine the next BAR.

Referring to the state 910, the c