Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
6253334
Amdahl , ; et al.
June 26, 2001
Title
Three bus server architecture with a legacy PCI bus and mirrored I/O PCI buses
Abstract
A fault-tolerant computer system includes a processor and a memory, connected to a system bus. The system includes at least two mirrored circuits, at least two mirrored IO devices, a detection means and a re-route means. The two mirrored circuits each include an interface to the system bus, and an IO interface. The input/output interface of each of the mirrored circuits is connected to one of the two mirrored IO devices. Detection means detect a load imbalance in the data transfer between the system bus and either one of the two mirrored IO devices. In response to the detection of a load imbalance, the re-route means re-routes the data transfer between the system bus and the other one of the two mirrored IO devices. In another embodiment, a fault-tolerant computer system includes a first, second and third IO bus, legacy devices, and two IO devices. The first IO bus is connected to the system bus. The legacy devices are connected to the first IO bus. The second and third IO buses are each connected to the system bus. The IO devices are each connected to a corresponding one of the second and third IO buses. An other embodiment of the invention can be characterized as an apparatus for transferring data between at least one transport protocol stack and a plurality of network adapters coupled to a computer network that supports recovery from network adapter and a connection failure.
Inventors:
Amdahl; Carlton G.
(Fremont,
CA
)
, Smith; Dennis H.
(Fremont,
CA
)
, Agneta; Don A.
(Morgan Hill,
CA
)
Assignee:
Micron Electronics, Inc.
(Nampa,
ID
)
Appl. No.:
941995
Filed:
October 1, 1997
Current U.S. Class:
714/4
714/43
718/105
709/239
710/312
Field of Search:
710/128,100,101,126,129,131,2,17,102 714/4,7,6,8,43-44 709/105,230,238,239,250,251,253
U.S. Patent Documents
4057847
November 1977
Lowell et al.
4100597
July 1978
Fleming et al.
4449182
May 1984
Rubinson et al.
4672535
June 1987
Katzman et al.
4692918
September 1987
Elliott et al.
4695946
September 1987
Andreasen et al.
4707803
November 1987
Anthony, Jr. et al.
4769764
September 1988
Levanon
4774502
September 1988
Kimura
4821180
April 1989
Gerety et al.
4835737
May 1989
Herrig et al.
4894792
January 1990
Mitchell et al.
4949245
August 1990
Martin et al.
4968977
November 1990
Chinnaswamy et al.
4999787
March 1991
McNally et al.
5006961
April 1991
Monico
5007431
April 1991
Donehoo, III
5033048
July 1991
Pierce et al.
5051720
September 1991
Kittirutsunetorn
5073932
December 1991
Yossifor et al.
5103391
April 1992
Barrett
5118970
June 1992
Olson et al.
5121500
June 1992
Arlington et al.
5123017
June 1992
Simpkins et al.
5136708
August 1992
Lapourtre et al.
5136715
August 1992
Hirose et al.
5138619
August 1992
Fasang et al.
5157663
October 1992
Major et al.
5210855
May 1993
Bartol
5245615
September 1993
Treu
5247683
September 1993
Holmes et al.
5253348
October 1993
Scalise
5261094
November 1993
Everson et al.
5265098
November 1993
Mattson et al.
5266838
November 1993
Gerner
5269011
December 1993
Yanai et al.
5272382
December 1993
Heald et al.
5272584
December 1993
Austruy et al.
5276863
January 1994
Heider
5277615
January 1994
Hastings et al.
5280621
January 1994
Barnes et al.
5283905
February 1994
Saadeh et al.
5307354
April 1994
Cramer et al.
5311397
May 1994
Harshberger et al.
5311451
May 1994
Barrett
5317693
May 1994
Cuenod et al.
5329625
July 1994
Kannan et al.
5337413
August 1994
Lui et al.
5351276
September 1994
Doll, Jr. et al.
5367670
November 1994
Ward et al.
5379184
January 1995
Barraza et al.
5379409
January 1995
Ishikawa
5386567
January 1995
Lien et al.
5388267
February 1995
Chan et al.
5402431
March 1995
Saadeh et al.
5404494
April 1995
Garney
5423025
June 1995
Goldman et al.
5426740
June 1995
Bennett
5430717
July 1995
Fowler et al.
5430845
July 1995
Rimmer et al.
5432715
July 1995
Shigematsu et al.
5432946
July 1995
Allard et al.
5438678
August 1995
Smith
5440748
August 1995
Sekine et al.
5448723
September 1995
Rowett
5455933
October 1995
Schieve et al.
5460441
October 1995
Hastings et al.
5463766
October 1995
Schieve et al.
5465349
November 1995
Geronimi et al.
5471617
November 1995
Farrand et al.
5471634
November 1995
Giorgio et al.
5473499
December 1995
Weir
5483419
January 1996
Kaczeus, Sr. et al.
5485550
January 1996
Dalton
5485607
January 1996
Lomet et al.
5487148
January 1996
Komori et al.
5490252
February 1996
Macera et al.
5491791
February 1996
Glowny et al.
5493574
February 1996
McKinley
5493666
February 1996
Fitch
5513314
April 1996
Kandasamy et al.
5513339
April 1996
Agrawal et al.
5517646
May 1996
Piccirillo et al.
5519851
May 1996
Bender et al.
5526289
June 1996
Dinh et al.
5528409
June 1996
Cucci et al.
5530810
June 1996
Bowman
5533193
July 1996
Roscoe
5533198
July 1996
Thorson
5535326
July 1996
Baskey et al.
5539883
July 1996
Allon et al.
5542055
July 1996
Amini et al.
5546272
August 1996
Moss et al.
5548712
August 1996
Larson et al.
5555510
September 1996
Verseput et al.
5559764
September 1996
Chen et al.
5559958
September 1996
Farrand et al.
5559965
September 1996
Oztaskin et al.
5560022
September 1996
Dunstan et al.
5564024
October 1996
Pemberton
5566299
October 1996
Billings et al.
5566339
October 1996
Perholtz et al.
5568610
October 1996
Brown
5568619
October 1996
Blackledge et al.
5572403
November 1996
Mills
5577205
November 1996
Hwang et al.
5579487
November 1996
Meyerson et al.
5579491
November 1996
Jeffries et al.
5579528
November 1996
Register
5581712
December 1996
Herrman
5581714
December 1996
Amini et al.
5584030
December 1996
Husak et al.
5586250
December 1996
Carbonneau et al.
5588121
December 1996
Reddin et al.
5588144
December 1996
Inoue et al.
5592610
January 1997
Chittor
5592611
January 1997
Midgely et al.
5596711
January 1997
Burckhartt et al.
5598407
January 1997
Bud et al.
5602758
February 1997
Lincoln et al.
5604873
February 1997
Fite et al.
5606672
February 1997
Wade
5608865
March 1997
Midgely et al.
5608876
March 1997
Cohen et al.
5615207
March 1997
Gephardt et al.
5621159
April 1997
Brown et al.
5621892
April 1997
Cook
5622221
April 1997
Genga, Jr. et al.
5625238
April 1997
Ady et al.
5627962
May 1997
Goodrum et al.
5628028
May 1997
Michelson
5630076
May 1997
Saulpaugh et al.
5631847
May 1997
Kikinis
5632021
May 1997
Jennings et al.
5636341
June 1997
Matsushita et al.
5638289
June 1997
Yamada et al.
5644470
July 1997
Benedict et al.
5644731
July 1997
Liencres et al.
5651006
July 1997
Fujino et al.
5652832
July 1997
Kane et al.
5652833
July 1997
Takizawa et al.
5652839
July 1997
Giorgio et al.
5652892
July 1997
Ugajin
5652908
July 1997
Douglas et al.
5655081
August 1997
Bonnell et al.
5655083
August 1997
Bagley
5655148
August 1997
Richman et al.
5659682
August 1997
Devarakonda et al.
5664118
September 1997
Nishigaki et al.
5664119
September 1997
Jeffries et al.
5666538
September 1997
DeNicola
5668943
September 1997
Attanasio et al.
5668992
September 1997
Hammer et al.
5669009
September 1997
Buktenica et al.
5671371
September 1997
Kondo et al.
5675723
October 1997
Ekrot et al.
5680288
October 1997
Carey et al.
5682328
October 1997
Roeber et al.
5684671
November 1997
Hobbs et al.
5689637
November 1997
Johnson et al.
5696895
December 1997
Hemphill et al.
5696899
December 1997
Kalwitz
5696949
December 1997
Young
5696970
December 1997
Sandage et al.
5701417
December 1997
Lewis et al.
5704031
December 1997
Mikami et al.
5708775
January 1998
Nakamura
5708776
January 1998
Kikinis
5712754
January 1998
Sides et al.
5715456
February 1998
Bennett et al.
5717570
February 1998
Kikinis
5721935
February 1998
DeSchepper et al.
5724529
March 1998
Smith et al.
5726506
March 1998
Wood
5727207
March 1998
Gates et al.
5729767
March 1998
Jones et al.
5732266
March 1998
Moore et al.
5737708
April 1998
Grob et al.
5737747
April 1998
Vishlitzky et al.
5740378
April 1998
Rehl et al.
5742514
April 1998
Bonola
5742833
April 1998
Dea et al.
5747889
May 1998
Raynham et al.
5748426
May 1998
Bedingfield et al.
5752164
May 1998
Jones
5754797
May 1998
Takahashi
5758165
May 1998
Shuff
5758352
May 1998
Reynolds et al.
5761033
June 1998
Wilhelm
5761045
June 1998
Olson et al.
5761085
June 1998
Giorgio
5761462
June 1998
Neal et al.
5761707
June 1998
Aiken et al.
5764924
June 1998
Hong
5764968
June 1998
Ninomiya
5765008
June 1998
Desai et al.
5765198
June 1998
McCrocklin et al.
5767844
June 1998
Stoye
5768541
June 1998
Pan-Ratzlaff
5768542
June 1998
Enstrom et al.
5771343
June 1998
Hafner et al.
5774640
June 1998
Kurio
5774645
June 1998
Beaujard et al.
5774741
June 1998
Choi
5777897
July 1998
Giorgio
5778197
July 1998
Dunham
5781703
July 1998
Desai et al.
5781716
July 1998
Hemphill et al.
5781744
July 1998
Johnson et al.
5781767
July 1998
Inoue et al.
5781798
July 1998
Beatty et al.
5784555
July 1998
Stone
5784576
July 1998
Guthrie et al.
5787019
July 1998
Knight et al.
5787459
July 1998
Stallmo et al.
5787491
July 1998
Merkin et al.
5790775
August 1998
Marks et al.
5790831
August 1998
Lin et al.
5793948
August 1998
Asahi et al.
5793987
August 1998
Quackenbush et al.
5793992
August 1998
Steele et al.
5794035
August 1998
Golub et al.
5796185
August 1998
Takata et al.
5796580
August 1998
Komatsu et al.
5796934
August 1998
Bhanot et al.
5796981
August 1998
Abudayyeh et al.
5797023
August 1998
Berman et al.
5798828
August 1998
Thomas et al.
5799036
August 1998
Staples
5799196
August 1998
Flannery
5801921
September 1998
Miller
5802269
September 1998
Poisner et al.
5802298
September 1998
Imai et al.
5802305
September 1998
McKaughan et al.
5802324
September 1998
Wunderlich et al.
5802393
September 1998
Begun et al.
5802552
September 1998
Fandrich et al.
5802592
September 1998
Chess et al.
5803357
September 1998
Lakin
5805804
September 1998
Laursen et al.
5805834
September 1998
McKinley et al.
5809224
September 1998
Schultz et al.
5809256
September 1998
Najemy
5809287
September 1998
Stupek, Jr. et al.
5809311
September 1998
Jones
5809555
September 1998
Hobson
5812748
September 1998
Ohran et al.
5812750
September 1998
Dev et al.
5812757
September 1998
Okamoto et al.
5812858
September 1998
Nookala et al.
5815117
September 1998
Kolanek
5815647
September 1998
Buckland et al.
5815651
September 1998
Litt
5815652
September 1998
Ote et al.
5821596
October 1998
Miu et al.
5822547
October 1998
Boesch et al.
5826043
October 1998
Smith et al.
5829046
October 1998
Tzelnic et al.
5835719
November 1998
Gibson et al.
5835738
November 1998
Blackledge, Jr. et al.
5838932
November 1998
Alzien
5838935
November 1998
Davis et al.
5841964
November 1998
Yamaguchi
5841991
November 1998
Russell
5845061
December 1998
Miyamoto et al.
5845095
December 1998
Reed et al.
5850546
December 1998
Kim
5852720
December 1998
Gready et al.
5852724
December 1998
Glenn, II et al.
5857074
January 1999
Johnson
5857102
January 1999
McChesney et al.
5864653
January 1999
Tavallaei et al.
5864654
January 1999
Marchant
5864713
January 1999
Terry
5867730
February 1999
Leyda
5875307
February 1999
Ma et al.
5875308
February 1999
Egan et al.
5875310
February 1999
Buckland et al.
5878237
March 1999
Olarig
5878238
March 1999
Gan et al.
5881311
March 1999
Woods
5884027
March 1999
Garbus et al.
5884049
March 1999
Atkinson
5886424
March 1999
Kim
5889965
March 1999
Wallach et al.
5892898
April 1999
Fujii et al.
5892915
April 1999
Duso et al.
5892928
April 1999
Wallach et al.
5893140
April 1999
Vahalia et al.
5898846
April 1999
Kelly
5898888
April 1999
Guthrie et al.
5905867
May 1999
Giorgio
5907672
May 1999
Matze et al.
5909568
June 1999
Nason
5911779
June 1999
Stallmo et al.
5913034
June 1999
Malcolm
5918057
June 1999
Chou et al.
5922060
July 1999
Goodrum
5923854
July 1999
Bell et al.
5930358
July 1999
Rao
5935262
August 1999
Barrett et al.
5936960
August 1999
Stewart
5938751
August 1999
Tavallaei et al.
5941996
August 1999
Smith et al.
5964855
October 1999
Bass et al.
5983349
November 1999
Kodama et al.
5987554
November 1999
Liu et al.
5987621
November 1999
Duso et al.
5987627
November 1999
Rawlings, III
6012130
January 2000
Beyda et al.
Foreign Patent Documents
0 866 403 A1
Sep., 1998
EP
04 333 118
Nov., 1992
JP
05 233 110
Sep., 1993
JP
07 093 064
Apr., 1995
JP
07 261 874
Oct., 1995
JP
Other References
Lyons, Computer Reseller News, Issue 721, pp. 61-62, Feb. 3, 1997, "ACC Releases Low-Cost Solution for ISPs.". .
M2 Communications, M2 Presswire, 2 pages, Dec. 19, 1996, "Novell IntranetWare Supports Hot Pluggable PCI from NetFRAME.". .
Rigney, PC Magazine, 14(17): 375-379, Oct. 10, 1995, "The One for the Road (Mobile-aware capabilities in Windows 95).". .
Shanley, and Anderson, PCI System Architecture, Third Edition, p. 382, Copyright 1995. .
ftp.cdrom.com/pub/os2/diskutil/, PHDX software, phdx.zip download, Mar. 1995, "Parallel Hard Disk Xfer.". .
Cmasters, Usenet post to microsoft.public.windowsnt.setup, Aug. 1997, "Re: FDISK switches.". .
Hildebrand, N., Usenet post to comp.msdos.programmer, May 1995, "Re: Structure of disk partition into.". .
Lewis, L., Usenet post to alt.msdos.batch, Apr. 1997, "Re: Need help with automating FDISK and FORMAT.". .
Netframe, http://www.netframe-support.com/technology/datasheets/data.htm, before Mar. 1997, "Netframe ClusterSystem 9008 Data Sheet.". .
Simos, M., Usenet post to comp.os.msdos.misc, Apr. 1997, "Re: Auto FDISK and FORMAT.". .
Wood, M. H., Usenet post to comp.os.netware.misc, Aug. 1996, "Re: Workstation duplication method for WIN95.". .
Gorlick, M., Conf. Proceedings: ACM/ONR Workshop on Parallel and Distribution Debugging, pp. 175-181, 1991, "The Flight Recorder: An Architectural Aid for System Monitoring.". .
IBM Technical Disclosure Bulletin, 92A+62947, pp. 391-394, Oct. 1992, Method for Card Hot Plug Detection and Control. .
Davis, T, Usenet post to alt.msdos.programmer, Apr. 1997, "Re: How do I create an FDISK batch file?". .
Davis, T., Usenet post to alt.msdos.batch, Apr. 1997, "Re: Need help with automating FDISK and FORMAT . . . ". .
NetFrame Systems Incorporated, Doc. No. 78-1000226-01, pp. 1-2, 5-8, 359-404, and 471-512, Apr. 1996, "NetFrame Clustered Multiprocessing Software: NW0496 DC-ROM for Novel.RTM. NetWare.RTM. 4.1 SMP, 4.1, and 3.12.". .
Shanley, and Anderson, PCI System Architecture, Third Edition, Chapter 15, pp. 297-302, Copyright 1995, "Intro To Configuration Address Space.". .
Shanley, and Anderson, PCI System Architecture, Third Edition, Chapter 16, pp. 303-328, Copyright 1995, "Configuration Transactions.". .
Sun Microsystems Computer Company, Part No. 802-5355-10, Rev. A, May 1996, "Solstice SyMON User's Guid.". .
Sun Microsystems, Part No. 802-6569-11, Release 1.0.1, Nov. 1996, "Remote Systems Diagnostics Installation & User Guide.". .
PCI Hot-Plug Specification, Preliminary Revision for Review Only, Revision 0.9, pp. i-vi, and 1-25, Mar. 5, 1997. .
SES SCSI-3 Enclosure Services, X3T10/Project 1212-D/Rev 8a, pp. i, iii-x, 1-76, and I-1 (index), Jan. 16, 1997. .
Compaq Computer Corporation, Technology Brief, pp. 1-13, Dec. 1996, "Where Do I Plug the Cable? Solving the Logical-Physical Slot Numbering Problem.". .
Standard Overview, http://www.pc-card.com/stand_overview.html#1, 9 pages, Jun. 1990, "Detailed Overview of the PC Card Standard.". .
Digital Equipment Corporation, datasheet, 140 pages, 1993, "DECchip 21050 PCI-TO-PCI Bridge.". .
NetFRAME Systems Incorporated, News Release, 3 pages, referring to May 9, 1994, "NetFRAME's New High-Availability ClusterServer Systems Avoid Scheduled as well as Unscheduled Downtime.". .
Compaq Computer Corporation, Phenix Technologies, LTD, and Intel Corporation, specification, 55 pages, May 5, 1994, "Plug & Play BIOS Specification.". .
NetFRAME Systems Incorporated, datasheet, 2 pages, Feb. 1996, "NF450FT Network Mainframe.". .
NetFRAME Systems Incorporated, datasheet, 9 pages, Mar. 1996, "NetFRAME Cluster Server 8000.". .
Joint work by Intel Corporation, Compaq, Adaptec, Hewlett Packard, and Novell, presentation, 22 pages, Jun. 1996, "Intelligent I/O Architecture.". .
Lockareff, M., HTINews, http://www.hometoys.com/htinews/dec96/articles/Ionworks.htm, 2 pages, Dec. 1996, "Loneworks--An Introduction.". .
Schofield, M.J., http://www.omegas.co.uk/CAN/canworks.htm, 4 pages, Copyright 1996, 1997, "Controller Area Network--How CAN Works.". .
NTRR, Ltd, http://www.nrtt.demon.co.uk/cantech.html, 5 pages, May 28, 1997, "CAN: Technical Overview.". .
Herr, et al., Linear Technology Magazine, Design Features, pp. 21-23, Jun. 1997, "Hot Swapping the PCI Bus.". .
PCI Special Interest Group, specification, 35 pages, Draft For Review Only, Jun. 15, 1997, "PCI Bus Hot Plug Specification.". .
Microsoft Corporation, file:///A.vertline./Rem_devs.htm, 4 pages, Copyright 1997, updated Aug. 13, 1997, "Supporting Removable Devices Under Windows and Windows NT.". .
Haban, D. & D. Wybranietz, IEEE Transaction on Software Engineering, 16(2):197-211, Feb. 1990, "A Hybrid Monitor for Behavior and Performance Analysis of Distributed Systems.".~
Primary Examiner:
Beausoliel, Jr.; Robert W.
Assistant Examiner:
Baderman; Scott T.
Attorney, Agent or Firm:
Knobbe, Martens, Olson & Bear LLP
Parent Case Text
PRIORITY
The benefit under 35 U.S.C. .sctn. 119(e) of the following U.S. Provisional Application entitled, "Three Bus Server Architecture With A Legacy PCI Bus and Mirrored I/O PCI Buses," as application Ser. No. 60/046,490, filed on May 13, 1997, and U.S. Provisional Application entitled, Means For Allowing Two Or More Network Interface Controller Cards To Appear as One Card To An Operating System," Ser. No. 60/046,491, filed on May 13, 1997, are hereby claimed.
Claims
What is claimed is:
1. A fault-tolerant computer system with a processor and a memory, connected to a system bus, and said fault-tolerant computer system comprising:
a first input/output (IO) bus connected to the system bus;
a plurality of legacy devices connected to said first IO bus;
a second and a third IO bus each connected to the system bus;
at least two IO devices each connected to a corresponding one of said second and said third IO buses, said at least two IO devices providing redundant access to a common data and/or network; and load balancing means for balancing data transfers between said processor and said at least two IO devices.
2. The fault-tolerant computer system of claim 1, further comprising:
re-routing means for re-routing data transfers between said processor and an inaccessible one of said at least two IO devices to an other of said at least two IO devices.
3. The fault-tolerant computer system of claim 1, further comprising:
at least two bridge interface units each connected between a corresponding one of said at least two IO buses and said system bus.
4. The fault-tolerant computer system of claim 1, wherein further:
the at least two IO devices are redundant storage devices for storage of mirrored data.
5. The fault-tolerant computer system of claim 1, wherein further:
the at least two IO devices are redundant network interface devices for transferring data between the computer system and a network.
6. The fault-tolerant computer system of claim 4, further comprising:
mirroring means for mirroring write data transfers from said processor to said redundant storage devices.
7. The fault-tolerant computer system of claim 5, further comprising: Load balancing means for balancing data transfers between said processor and said redundant network interface devices.
8. A fault-tolerant computer system with a processor and a memory, connected to a system bus, said fault-tolerant computer system comprising:
at least two mirrored circuits, each with a system bus interface and an input/output (IO) interface, and each system bus interface connected to the system bus:
at least two mirrored IO devices each connected to a corresponding one of the IO interfaces, said IO devices being redundant network interface devices for transferring data between the computer system and a network:
a detection system capable of detecting a load imbalance in a data transfer between the system bus and one of said at least two mirrored IO devices;
a router configured to re-work the data transfer between said system bus and another of the at least two mirrored IO devices responsive to said detection system;
at least two incoming data protocol stacks, and each of said at least two incoming data protocol stacks adapted to contain a plurality of data packets from the network for processing; and
the memory device containing instructions that when executed transfer an incoming data packet to a selected one of said at least two incoming data protocol stacks to balance the processing.
9. A fault-tolerant computer system with a processor and a memory, connected to a system bus, said fault-tolerant computer system comprising:
at least two mirrored circuits each with a system bus interface and an input/output (IO) interface and each system bus interface connected to the system bus;
at least two mirrored IO devices each connected to a corresponding one of the IO interfaces, said IO devices being redundant network interface devices for transferring data between the computer system and a network;
a detector configured to detect a load imbalance in a data transfer between the system bus and one of said at least two mirrored IO devices;
a router configured to re-route the data transfer between said system bus and another of the at least two mirrored IO devices responsive to said detector;
at least two outgoing data protocol stacks, each of said at least two outgoing data protocol stacks adapted to contain a plurality of data packets destined for the network for processing; and
the memory device containing instructions that when executed transfer an outgoing data packet to a selected one of said at least two outgoing data protocol stacks to balance the processing.
10. A fault-tolerant computer system with a processor and a memory, connected to a system bus, and said fault-tolerant computer system comprising:
at least two mirrored circuits, each with a system bus interface connected to the system bus and an input/output (IO) interface;
at least two mirrored input/output (IO) devices each connected to a corresponding one of the IO interfaces;
a detection means for detecting a load imbalance between protocol stack pairs in a data transfer between the system bus and a one of said at least two mirrored IO devices;
a re-route means for re-routing the data transfer between said system bus and another of the at least two mirrored IO devices responsive to said detection means;
wherein the at least two mirrored IO devices are redundant network interface devices for transferring data between the computer system and a network; and
a failure detection means for detecting a load imbalance in a data transfer between the system bus and an inaccessible one of said at least two mirrored IO devices.
11. An apparatus for transferring data between at least one transport protocol stack and a plurality of network adaptors coupled to a computer network that supports recovery from network adapter and connection failure, comprising:
a first interface bound to the at least one transport protocol stack; and
a second interface bound to the plurality of network adapters;
wherein the first interface is configured to receive a first MAC-level packet from a transport protocol stack and to forward the first packet through the second interface to a network adapter in the plurality of network adapters, and wherein the second interface is configured to receive a second packet from a network adapter in the plurality of network adapters and to forward the second packet through the first interface to a transport protocol; and
a failure managing means for detecting a failed network adapter in the plurality of network adapters and for rerouting packets to a different network adapter in the plurality of network adapters, wherein said failure managing means operates in an operating system-independant manner.
12. The apparatus of claim 11, wherein said failed network adapter is a failed primary network adapter and said different network adapter is a secondary network adapter.
13. The apparatus of claim 11, further comprising:
load-sharing means for performing load sharing by selectively routing packets to network adapters in the plurality of network adapters.
14. The apparatus of claim 11, wherein the apparatus can function as an NDIS intermediate driver, wherein;
the first interface is configured to present a virtual adapter for binding to at least one protocol stack; and
the second interface is configured to present a virtual transport protocol stack for binding to a network adapter in the plurality of network adapters.
15. The apparatus of claim 11, wherein all adapters in the plurality of adapters bound to the second interface are configured to the same physical address.
16. The apparatus of claim 11, wherein the apparatus is implemented at the MAC layer and below.
17. The apparatus of claim 11, wherein the apparatus can function as a prescan protocol stack for examining packets flowing between protocol stacks and drivers.
18. The apparatus of claim 11, wherein said failure managing means operates in a manner that is independent of network adapter hardware.
Description
RELATED APPLICATION
The subject matter of U.S. Application entitled "Means For Allowing Two Or More Network Interface Controller Cards To Appear As One Card To An Operating System," filed on Oct. 1, 1997, application Ser. No. 08/943,379, and is related to this application.
Appendices
Appendix A, which forms a part of this disclosure, is a list of commonly owned copending U.S. Patent applications. Each one of the applications listed in Appendix A is hereby incorporated herein in its entirety by reference thereto.
Appendix B, which forms part of this disclosure, is a copy of the U.S. provisional patent application filed May 13, 1997, entitled "Three Bus Server Architecture With A Legacy PCI Bus and Mirrored I/O PCI Buses," and assigned application Ser. No. 60/046,490.
Copyright Authorization
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a system for enhancing the performance of a a computational server connected to both data storage devices and networks, and more particularly to a system that provides load balancing and fault tolerance capabilities.
2. Related Art
Personal computers (PCs) have undergone evolutionary changes since the original models based on the Intel 8088 microprocessor, such as the International Business Machine Corporation (IBM) PC and other IBM-compatible machines. As the popularity of PCs have grown, so has the demand for more advanced features and increased capability, reliability and speed. Higher order microprocessors such as the Intel 20286, 30386, 40486, and more recently, the Pentium.RTM. series have been developed. The speed of the fastest of these processors, the Pentium.RTM. II series is 266 MHz as opposed to the 8 MHz clock speed for the 8088 microprocessor.
Faster bus architectures have been developed to support the higher processor speeds. Modern computer systems typically include one or more processors coupled through a system bus to main memory. The system bus also typically couples to a high bandwidth expansion bus, such as the Peripheral Component Interconnect (PCI) bus which operates at 33 or 66 MHz. High speed devices such as small computer systems interface (SCSI) adapters, network interface cards (NIC), video adapters, etc. can be coupled to a PCI bus. An older type low bandwidth bus such as the Industry Standard Architecture (ISA), also referred to as the AT bus, is generally coupled to the system bus as well. This bus operates at 6 MHz. To the ISA bus are attached various low speed devices such as keyboard, monitor, Basic Input/Output System (BIOS) and parallel and communications ports. These devices are known as legacy devices because they trace their lineage, their legacy, back to the initial PC architecture introduced by IBM in 1982:
With the enhanced processor and bus speeds the PC now is utilized to perform as a server, and to provide high speed data transfers between, for example, a network and a storage device. There are, however, several constraints inherent in the current PC architecture, which limit its performance as a server. First, the legacy devices and the high speed devices compete for limited bus bandwidth, and thereby degrade system performance. Second, whereas the original PC operated as a standalone and did not affect other PCs when there was a system failure, the PC/Server must be able to maintain operation despite failure of individual components. The PC/Server must, in other words, be fault-tolerant, i.e. able to maintain operation despite the failure of individual components.
What is needed is a way to move the PC from a standalone model to a server model. In doing so, the inherant conflict of high and low bandwidth buses must be resolved, fault-tolerance must be provided, and throughput should be enhanced.
SUMMARY
An embodiment of the present invention provides a fault-tolerant computer system with a processor and a memory, connected to a system bus. The system includes at least two mirrored circuits, at least two mirrored input/output devices, a detection means and a re-route means. The two mirrored circuits each include an interface to the system bus, and an input/output interface. The input/output interface of each of the mirrored circuits is connected to one of the two mirrored input/output devices. Detection means detect a load imbalance in the data transfer between the system bus and either one of the two mirrored IO devices. In response the re-route means, re-routes the data transfer between the system bus and the other one of the two mirrored IO devices.
In another embodiment, a fault-tolerant computer system includes a first, second and third input/output (IO) bus, legacy devices, and two IO devices. The first IO bus is connected to the system bus. The legacy devices are connected to the first IO bus. The second and third IO buses are each connected to the system bus. The IO devices are each connected to a corresponding one of the second and third IO buses.
An other embodiment of the invention can be characterized as an apparatus for transferring data between at least one transport protocol stack and a plurality of network adapters coupled to a computer network that supports recovery from network adapter and a connection failure. The apparatus includes a first interface bound to at least one transport protocol stack and a plurality of network adapters coupled to a computer network that supports recovery from network adapter and connection failure. The apparatus includes a first interface bound to at least one transport protocol stack. It also includes a second interface bound to the plurality of network adapters, as well as a mechanism coupled to the first interface and the second interface that receives a first MAC-level packet from a transport protocol stack through the first interface and forwards the first MAC-level packet through the second interface to a network adapter in a protocol independent matter. The apparatus also includes a mechanism coupled to the first interface and the second interface that receives the second packet from a network adapter through the second interface and forwards the second packet through the first interface to a transport protocol stack.
According to another aspect of the present invention, the apparatus can function as a prescan protocol stack for examining packets flowing between protocol stacks and drivers.
DESCRIPTION OF THE FIGURES
FIG. 1 is a block diagram illustrating one embodiment of a server computer and client computers linked in a network through network interface cards.
FIG. 2 is a flow diagram illustrating one embodiment of the overall process of detecting faulty network interface cards and automatically switching from a primary network interface card to a secondary network interface card.
FIG. 3 is a block diagram illustrating an implementation of software modules running within a server computer under a Novell Netware network operating system.
FIG. 4 is a block diagram illustrating an implementation of software modules running within a server computer under a Microsoft.RTM. Windows.RTM. NT network operating system.
FIG. 5 is a block diagram illustrating one embodiment of the structure of a probe packet for an Ethernet network system.
FIG. 6 is a block diagram illustrating one embodiment of the structure of a probe packet for a FDDI or Token Ring network system.
FIG. 7 is a flow diagram illustrating one embodiment of a process for determining whether a network interface adapter has failed.
FIG. 8 is a block diagram of one embodiment of a MAC level packet, including a header, destination address and contents.
FIG. 9 is a flow diagram illustrating one embodiment of the steps involved in moving data packets between network interface cards and protocol.
FIG. 10 is a flow diagram illustrating one embodiment of the steps involved in load sharing data packets across a plurality of network interface cards.
FIG. 11 is a hardware block diagram of a server with a legacy backbone 46 and a mirrored block 48.
FIG. 12 shows an embodiment of the dual I/O buses with redundant storage links.
FIG. 13 show the dual I/O buses with redundant network links.
FIGS. 14A-B shows an embodiment of the dual I/O buses of a computer system with bridges and canisters which support a Hot-Add and Hot-Swap.
FIG. 15 is a map describing the distribution of address spaces to busses in a hierarchical, multi-PCI bus computer system shown in FIGS. 14A-Ba.
DESCRIPTION
The present invention includes a system providing failure detection and re-routing of network packets in a computer having multiple network interface cards (NICs) connected as groups (MULTISPAN groups) each to a common network segment. In addition, embodiments of the invention include load sharing to distribute network packet traffic across the NICs in a group. Further, the present invention may provide this benefit to all traffic regardless of the network protocol used to route the traffic (i.e., in a protocol independent manner).
Fault detection and recovery is accomplished by "MULTISPAN", a process operating within the system. For each group of NICs, if there is a failure virtually on any component related to network traffic, the MULTISPAN process detects the interruption of the data flow and determines which NIC is no longer working. MULTISPAN directs traffic through only the working NICs until the failed NIC is again able to send and receive traffic reliably. Restoring a NIC to reliable operation may involve such steps as replacing a failed NIC (in a computer which supports the hot replacement of failed components), reconnecting or replacing a cable, replacing a failed network switch or router. By placing each NIC in the server on a separate path to the network, MULTISPAN will normally keep the system running until repairs can be accomplished. Being able to schedule repairs decreases cost of owning and operating the computer system.
The MULTISPAN system can be implemented in many different forms, as discussed below. Programming languages such as C, C++, Cobol, Fortran, Basic or any other conventional language can be employed to provide the functions of the MULTISPAN system. In addition, software related to the MULTISPAN system can be stored within many types of programmed storage devices. A programmed storage device can be a Random Access Memory, Read-Only Memory, floppy disk, hard disk, CD-ROM or the like.
In one embodiment, the present invention identifies one NIC, called the primary, by which the entire group is identified. Some operating systems disallow more than a single NIC on a single network segment. For such operating systems, this embodiment uses the primary to represent the entire group to the operating system. The remaining NICs in the group are hidden from the operating system.
In one embodiment of the invention, network failures are detected by a process of sending out "probe" packets with in a MultiSpan group from primary NIC to secondary NIC(s) and vice versa. If the probe packet fails to arrive at the target NIC, the failing path is determined and recovery procedure is performed. The MULTISPAN process confirms the source NIC that has failed by repeatedly sending packets to every other NIC in the group until, by process of elimination, the failing NIC is determined. If the failing NIC is a primary NIC, the MULTISPAN process stops routing network traffic through this unreachable/failed NIC. Traffic is thereafter directed through one of the remaining NIC(s), which is designated as the new primary (this process of designating a new primary when the current one fails is called fail-over). MULTISPAN continues to attempt to send probe packets to and from the failing NIC and, should probe packets once again be successfully delivered, the NIC is returned to service as a secondary.
In an embodiment of the present invention, the traffic load for a network segment is shared among all NICs in the group connected to the segment. Traffic inbound to the server from the network segment may arrive through any NIC in the group, and be properly delivered by MULTISPAN to the operating system. In some situations, all inbound traffic arrives through a single NIC (usually the primary), while in others traffic may arrive through all NICs at once. Traffic outbound from the server to the network segment is directed through some or all NICs in the group according to some algorithm which may vary from one embodiment to another, or may vary within one embodiment from one group to another.
FIG. 1 is an illustration of a server computer 10 linked through a network backbone 12 to client computers 14 and 16. The server computer 10 can be any well-known personal computer such as those based on an Intel microprocessor, Motorola microprocessor, Cyrix microprocessor or Alpha microprocessor. Intel microprocessors such as the Pentium.RTM., Pentium.RTM. Pro and Pentium.RTM. II are well-known within the art. The server computer 10 includes a group of network interface cards (NICs) 18, 20, 22 which provide communications between the server computer 10 and the network backbone 12. Similarly, the client computer 14 includes a network interface card 24 and the client computer 16 includes a network interface card 26 for communicating with the network backbone 12. The network backbone 12 may be a cable such as a 10B2 Thin Ethernet cable, an Ethernet 10BT workgroup hub such as a 3Com Hub 8/TPC, or several interconnected switches or routers 28, 30, 32 such as a Cisco Catalyst 500, as shown in FIG. 1.
As will be explained in more detail below, the client computers 14 and 16 make requests for information from the server computer 10 through the network backbone 12. Under normal circumstances, the requests made by the client computers are acknowledged through the primary network interface card 18 to the server computer 10. However, if the primary network interface card 18, or cable 34 or switch or router 28 fails, the embodiments of the present invention provide a mechanism for routing network requests through one of the secondary network interface cards 20 or 22. The re-routing of network requests is transparent to the client computer 14 or 16.
FIG. 2 depicts one embodiment of the overall process 45 of detecting errors for NICs located in a MultiSpan group. The process 45 begins at a start state 48 and then moves to process 49 wherein a MULTISPAN group is created. During process 49, a user identifies the NICs to be grouped and issues a command to the MULTISPAN system to create a group. In one embodiment, the command is issued through a command prompt. In another embodiment, the command is issued through a management application which may be remote from or local to the computer system 10, and directed to the present invention via the simple network management protocol (SNMP) and associated SNMP agent software. If there is an error, the user is notified that there is a failure in creating the group. Otherwise, the user is returned with a prompt indicating that the MULTISPAN group was created successfully. The binding process will be discussed in more detail below. The MULTISPAN process uses the user-supplied information to associate all NICs in a particular group together and with their primary NIC.
The process 45 then moves to state 50 wherein the first MultiSpan group is retrieved. Proceeding to state 52, the first NIC in the current group is retrieved. At process state 54 the first NIC is analyzed to determine whether it is functioning properly, or is failing. The process 45 then moves to decision state 56 to determine whether any errors were detected. If a failure was detected at the decision state 56 for this NIC, the process 45 proceeds to state 58, wherein the NIC is disabled from the MULTISPAN group. The process 45 then proceeds to decision state 60 to determine whether the disabled NIC was a primary NIC. If a determination is made at state 60 that the failed NIC is a primary, the process 45 moves to process state 62 and enables the secondary NIC as a primary. The process 45 then moves to decision state 64 to determine whether the current NIC is the last NIC in the MULTISPAN group. Similarly, if a determination is made at the decision state 56 that there were no errors, the process 45 also moves to decision state 64.
If a determination is made at the decision state 64 that there are more NICs in this MULTISPAN group, then process 45 moves to state 66 to select the next NIC to analyze. The process 45 then returns to process state 54 to analyze the newly selected NIC for errors.
If a determination is made at the decision state 64 that there are no more NICs in the current MULTISPAN group, the process 45 proceeds to decision state 68 to check whether this was the last group. If a determination is made that this is not the last group, the process 45 moves to process state 70 and selects the next group. The process 45 then returns to state 52 to begin analyzing the group's NICs. If a determination is made at the decision state 68 that this is the last group, the process 45 returns to state 50 to begin checking the first group once again.
Novell Netware Implementation
Referring now to FIG. 3, an overview of the software modules running within the server computer 10 is illustrated. In the implementation described below, the server computer 10 is running under the Novell Netware operating system. As shown, a protocol stack 100 includes a first data packet 102 and a second data packet 104. In this figure, the protocol stack is the IPX (InternetNetwork Packet Exchange) protocol but could include TCP/IP or NETBEUI or any other network packet protocols in combinations for transmitting data across a network. As is known, generally client computers request data from server computers by attempting to read particular files within the server computer. In order for the client computers and server computer 10
to communicate across cables, the data is broken into a series of data packets. These data packets include network routing information and small portions of the requested data. The network packets are then routed from the server computer to the requesting client computer and thereafter rebuilt into the requested data file.
As is known, the link support layer (LSL) is the interface between drivers and protocol stacks within the Novell NetWare operating system. More information on the link support layer 112 and prescan drivers can be found in the Novell LAN Developer Guide (Novell Corporation, Orem Utah).
The main objectives embodiments of the MULTISPAN processes are (1) to load share LAN traffic among NICs in a group, and (2) to perform a transparent fail-over when a primary adapter in a group fails. These features may be achieved essentially without modification to the transport protocol portions of the packets. Instead, the features are achieved through system services provided for interfacing with LAN drivers and other Netware system modules like the Media Specific Module (MSM), Topology Specific Module (TSM) and Link Support Layer (LSL). The MULTISPAN process may be a totally media-dependent intermediate module.
Once drivers for primary and secondary NICs are loaded, a multispan group can be created by issuing MSP BIND statement, specifying the slot numbers of primary and secondary adapters. If there are any protocol stacks bound to the secondary NIC, the MULTISPAN process displays an error message and does not create a MULTISPAN group.
The user can optionally specify more than one secondary NIC when creating a group. Typically this is done to allow load sharing of the outbound LAN traffic across all the NMCs. If any LAN drivers had been loaded before loading MSP.NLM, then MSP BIND command does not create any MULTISPAN groups and displays the error message "Error locating DCT Address in Internal Table". Thus, the MSP.NLM module should be loaded before any LAN drivers. As discussed above, MSP.NLM module should normally be loaded under Netware through the STARTUP.NCF file.
The MULTISPAN system allows users to configure LAN cards of same topology, but different kind (example Intel Smart card and Intel Pro 100B card) into a MULTISPAN group. For example, issuing the following commands will load several Ethernet cards and bind them into a MULTISPAN group.
load e100b.lan slot=10001 frame=ethernet.sub.-- 802.2 name=primary.sub.-- 8022
load e100b.lan slot=10001 frame=ethernet.sub.-- 802.3 name=primary.sub.-- 8023
load e100b.lan slot=10002 frame=ethernet.sub.-- 802.2 name=secondary.sub.-- 8022
load e100b.lan slot=10002 frame=ethernet.sub.-- 802.3 name=secondary.sub.-- 8023
bind ipx to primary.sub.-- 8022 net=f001
bind ipx to primary.sub.-- 8023 net=f002
MSP BIND 10001 10002
The MSP Bind command can also be issued specifying logical names associated with the primary and secondary NICs. For example:
MSP NAMEBIND primary.sub.-- 8022 secondary.sub.-- 8022
Once the MSP BIND or MSP NAMEBIND commands have been issued, a MULTISPAN group is created for all logical frame types supported by the NIC. In addition, the probing mechanism becomes active for the current base frame. In the case of above example group gets created for frame type of ETHERNET.sub.-- 802.2 and ETHERNET.sub.-- 802.3. When a group gets created, MULTISPAN performs "Link Intergrity" check to make sure that all the NICs in the group are accessable from one to another by using the same probing mechanism described earlier. If the check fails the user is displayed with appropriate error messages.
The MULTISPAN NLM gains control over the network activity by registering a prescan protocol stack for sends and receives. The purpose of a prescan protocol stack is to provide the ability to examine the packets flowing between protocol stacks and drivers. MULTISPAN also intercepts the MLID registration process by patching the LSL portion of server code during the software load time. In Netware, protocol stacks send packets via LSL using a buffer known as ECBs (Event Control Block), which not only contains the address of the packet payload and its length but also contains information such as about which NIC to use and what frame type to use on the medium. This information is helps LSL in deciding the driver interface it needs to correspond to, in sending a packet. When LSL corresponds to MULTISPAN PreScan stack, it uses the same data structure to pass in information.
As illustrated in FIG. 3, a packet 102 is sent from the IPX protocol stack 100 via LSL 112. The LSL checks the registered pre-scan stack and calls the MULTISPAN PreScan send handler routine. The MULTISPAN PRESCAN process 110 determines the NIC through which the packet is to be sent.
Once the packets 102 and 104 have been analyzed by the MULTISPAN prescan module 110, they are output to their target network interface driver 120 and 122 respectively, and thereafter sent to the network backbone 12. By way of illustration, the packet 104 could be routed through the MULTISPAN prescan module 110 to a secondary network interface card driver 122 and thereafter out to the network backbone 12. It should be noted that during normal operations, Novell NetWare would only allow packets to flow through a single network interface card. MULTISPAN presents the primary NIC of each group as this single adapter, transparently applying its load sharing and failure recovery functions to the group.
Thus, data packet 104 can be sent to the LSL 112 with information to route it through the primary driver 120 to a NIC 124. However, in order to distribute the load, the MULTISPAN prescan module 110 intercepts the packet 104 and alters its destination so that it flows through the secondary driver module 122 to the NIC 126 and out to the network backbone 12.
By the same mechanism, if the primary driver 120 or primary NIC 124 fails, the MULTISPAN prescan module 110 can route the packet 102 into the secondary driver 122 and out to the NIC 126. By determining the destination of every packet coming through the LSL, the MULTISPAN prescan module 110 can completely control the ultimate destination of each packet.
During the load process, the MULTISPAN module patches the server code for the NetWare functions LSLRegisterMLIDRTag() and LSLDeRegisterMLID(). In addition, the MULTISPAN module allocates enough memory needed for maintaining information pertinent to logical boards such as the address of the DriverConfigTable, Multicast address list, and original DriverControlEntry. Initialization related to generating NetWare Alerts is done at this point and an AESCalIBack procedure is scheduled for managing the probing functionality.
After loading the MULTISPAN.NLM, the user can configure the system to load drivers for both the primary NIC and one or more secondary NICs using the INETCFG command or by manually editing AUTOEXEC.NCF or manually loading drivers at the system console. The user can also choose the appropriate protocol stack to bind with for every instance of the primary NIC. Once this process is done, the MULTISPAN BIND command can be issued to associate NICs together into a group, and designate a primary adapter for the group.
As part of initialization, LAN drivers typically make call to register their instance with LSL via LSLRegisterMLIDRTag. This call manages all information pertinent to an occurrence of a logical board and assigns the caller with next logical board available. When the LSLRegisterMLIDRTag function is called by the NetWare drivers (MLIDs), control jumps to the MULTISPAN code as a result of a patch in the LSL made by the MULTISPAN module while loading. The MULTISPAN system saves the addresses of certain MLID data structures and maintains internal tables for every logical board. This information is passed to the real target of the MLID's call.
This technique allows embodiments of the MULTISPAN system to intercept certain dialogs between the MLID and the LSL or the protocol stacks for such purposes as establishing or changing multicast address lists and the DriverConfig Table. When a fail-over takes place, the MULTISPAN system can retrieve the multicast list from the local internal table and send a multicast update call to the switched-over NIC.
In addition to intercepting the control handler, MULTISPAN also intercepts the DriverReset call. When the DriverReset call fails for some reason (e.g., NIC is powered off during hot swap), MSM usually removes the instance of that driver from memory and makes it impossible to activate the driver for that particular instance. By intercepting the reset call, MULTISPAN can tell MSM that reset was successful but generate an NetWare Alert for failure of a particular adapter. Since MULTISPAN knows which NIC is active and which is not, it ensures that there are no side effects in doing this kind of interception.
Once the MULTISPAN BIND command is issued, the bind procedure locates the appropriate logical boards corresponding to the arguments specified and creates a MULTISPAN group for all logical frames that the NIC currently supports. The primary NIC is specified first, followed by one or more secondary NICs. The MULTISPAN process forms a group only if there is a match for frame-type across all NICs specified. Note that the primary NIC should have a protocol stack bound to it and that a secondaries should not have any protocol stack bound to them.
Once a MULTISPAN group of NICs is created, the probing module starts sending probe packets from the primary NIC to all secondary NICs and from all secondary NICs to the primary NIC to monitor the status of the network link. The structure of a the payload portion of a probe packet is illustrated by the data structure definition below:
struct HEART_BEAT {
LONG signature; //LONG value of `NMSP`
LONG seqNo; // sequence number of the probe packet sent.
LONG pSource; // pointer to structure pertaining to the source board
LONG pDestn; // pointer to structure pertaining to the destination board
};
struct IPX_HEADER {
WORD checkSum; //0xFFFF always
WORD packetLength; //size of IPX_HEADER+size of HEARTBEAT
BYTE transportControl; //zero,not used
BYTE packetType; //IPX_PACKET
BYTE destinationNetwork[4]; //zero
BYTE destinationNode[6]; //corresponds to node address of destination board.
WORD destSocket; //value returned by IPXOpenSocket() call.
BYTE sourceNetwork[4]; //zero
BYTE sourceNode[6]; //corresponds to node address of source board
WORD sourceSocket; //value returned by IPXOpenSocketo() call.
};
struct PROBE_PACKET {
IPX_HEADER ipxHeader;
HEART_BEAT heartBeat;
};
If any packets are not received, MULTISPAN system re-transmits the probe packet for a specified number of times. If there is a repeated failure, the MULTISPAN system determines which NIC failed by analyzing which packets were received and which were not, and removes the failing board from the bound group and deactivates the adapter by placing it in a wait mode. The MULTISPAN system thereafter monitors the deactivated board to determine if data packet reception begins to occur again on the deactivated board. If there is no packet reception for a specified time, MULTISPAN marks the board as dead. If the primary NIC is marked as dead, and there is at least one active secondary, then MULTISPAN does switch-over by causing a secondary NIC to be the primary. This is accomplished by shutting the board, changing the node address of the secondary NIC to that of primary NIC in the Driver Configuration Table (DCT) and then resetting the NIC. In addition, the multicast table of the original primary NIC is transferred to the switched-over primary and promiscuous mode is turned on if it was originally active for the primary.
In one embodiment, the MULTISPAN system also resets the source node address field in TCBs (Transmission Control Blocks) maintained by TSM for both failed and switch-over adapter. This is done to ensure that all load sharing NICs send packets with their current address, not the original address which was identified during load time thus eliminating the confusion with certain protocols (such as Ethertalk), which direct the requests to the node from which a reply was received.
Once the MULTISPAN system detects data packet reception on the old primary NIC, it activates the card to be a part of the group. The reactivated card then becomes a new secondary. If load sharing is enabled, the MULTISPAN system begins to use the board to share the outbound traffic. The fail-over process works the same way on this new configuration as before.
In order to load share the outbound traffic, MULTISPAN requires at least one secondary in a group. This feature can be enabled or disabled during runtime through the MULTISPAN LOAD SHARING command, which toggles this mode. When a packet is sent from the protocol stack 100 to the primary NIC 124 (the board which is known to the protocol stack), the MULTISPAN system intercepts the request and selects the next active board from the group on which the packet could be sent and changes the board number to the one selected. In one embodiment, the algorithm is based on a round-robin mechanism where every NIC in the group gets a turn to send packets. If a selected board in the bound group is marked "DISABLED", the MULTISPAN system bypasses that board and selects the next active board in the group. In another embodiment, the algorithm used makes a calculation based on the destination address in order to make routing of outgoing packets predictable to switches or routers connected to the group's NICs.
During load sharing, the MULTISPAN system changes the SendCompleteHandler in the Event Control Block (ECB) of the data packet to point to MULTISPAN primary NIC SendCompleteHandler. The purpose of this is to restore the original board number when the ECBs get handed back to the protocol stack through the LSL. This also fixes the problem when the system is running with Novell's IPXRTR product, wherein the Netware Core Protocol (NCP) does not recognize SendCompletes on the secondary NIC to which the protocol stacks are not bound.
Although the MULTISPAN system has been described above in relation to a Novell Netware implementation, the system is not so limited. For example, the MULTISPAN process can be implemented within other network operating systems such as Microsoft Windows NT, as discussed below.
Windows NT Implementation
FIG. 4 is a block diagram illustrating some of the major functional components of a Microsoft.RTM. Windows.RTM. NT system for transferring data between a plurality of protocol stacks 500, 502 and 504 and a plurality of NICs 505a,b in accordance with an aspect of the present invention. The protocol stacks include TCP/IP protocol stack 500, IPX/SPX (Synchronous Packet Exchange) protocol stack 502 and net BEUI protocol stack 504. These protocol stacks connect to NDIS 506, which is part of the Microsoft.RTM. Windows.RTM. NT operating system. NDIS 506 connects to NICs 18, 20 & 22 and additionally connects to a MULTISPAN system 508, which performs load sharing and fail-over functions.
A variety of references and device driver development kits are available from Microsoft describing the LAN driver model, NDIS, and how they interact. These will be familiar to anyone of ordinary skill in writing such drivers for Windows NT. The MULTISPAN system 508 is an NDIS 4.0 intermediate driver. FIG. 4 illustrates the relationship between the NDIS wrapper, transport protocols, NIC driver, and MULTISPAN driver in a Windows.RTM. NT system.
When the MULTISPAN driver 508 loads, it registers itself as an NDIS 4.0 intermediate driver. It creates a virtual adapter 510 on its upper edge for each group of NICs. The virtual adapter 510 binds to the transport protocols 500, 502 and 504
(e.g., TCP/IP, IPX/SPX). The lower edge 512 of the MULTISPAN driver 508 behaves like a transport protocol and binds to network interface cards 505a,b. When, for example, the TCP/IP protocol stack 500 sends out packets, they are intercepted by the MULTISPAN driver 508 first. The MULTISPAN driver 508 then sends them to the appropriate network adapter 505a or 505b. All the packets received by the NICs are passed to the bound MULTISPAN driver 508. The MULTISPAN driver then decides whether it should forward the packets to the transport protocols, depending on the state of the adapter.
The MULTISPAN driver is also responsible for verifying the availability of bound NICs. It detects adapter failures by periodically monitoring the activity of the NICs, as will be discussed in more detail below. If an adapter has failed, the MULTISPAN driver 508 disables the adapter and records it in the event log. If the failed NIC was a primary adapter, the MULTISPAN driver selects a secondary NIC to become the primary adapter.
Since Windows.RTM. NT does not allow the network address of a NIC to be changed dynamically, all the NICs bound to the MULTISPAN driver are configured to the same physical address when they are loaded. When the primary adapter fails, the MULTISPAN driver disables it and starts sending and receiving packets through a secondary adapter.
The MULTISPAN driver 508 continuously tracks the state of bound network interface cards. There are three different states for network interface cards. The "IN_USE" state means that the adapter is the primary adapter. All packets will be sent and received through this adapter when the load sharing feature is disabled. When load sharing is enabled, packets are sent out from all available NICs. The "READY" state means the adapter is in standby mode, but is operating correctly. When the primary adapter fails, one of the adapters in the "READY" state is changed to the "IN_USE" state and begins to send and receive packets. When the adapter cannot send or receive packets, it is set to a "DISABLED" state. The MULTISPAN driver sends packets out from the primary adapter (the NIC in "IN USE" state). It simply passes packets received from the primary adapter up to the protocols and discards packets received from all the other adapters.
The MULTISPAN driver 508 continuously monitors the activity of any bound adapters. In most LAN segments, "broadcast" packets are periodically sent out by different machines. All the NICs attached to the LAN segment should receive these packets. Therefore, if a network adapter has not received any packets for an extended period of time, it might not be functioning correctly. The MULTISPAN driver 508 uses this information to determine if the bound network interface card is functioning correctly. For those LAN segments where no stations send out broadcast packets, the MULTISPAN driver sends out probe packets, as discussed above in the Novell Netware implementation. All the NICs should receive probe packets, since they are broadcast packets. A NIC will be disabled if it does not receive these probe packets.
When the network adapter is in an "IN_USE" state, and its receiver idle time exceeds a pre-set threshold, that adapter might not be operating correctly. The Receiver idle time for a NIC is the time that has elapsed since the last packet was received by the NIC. The MULTISPAN driver then scans through all the adapters in the "READY" state. If the receiver idle time of an adapter in a "READY" state is shorter than that of the primary adapter, the MULTISPAN driver disables the primary adapter by setting it to the "DISABLED" state and changes the adapter in "READY" state to the "IN_USE" state. This adapter then becomes the primary adapter. The MULTISPAN system will now begin using the new network adapter to send and receive packets.
If the adapter is in a "READY" state and has not received any packets for a period of time, the MULTISPAN driver places the adapter in a "DISABLED" state. If the adapter is fixed and starts receiving packets, it is changed to the "READY" state.
The MULTISPAN driver uses an adapter packet filter to reduce the overhead introduced by the secondary adapters the MULTISPAN driver sets the packet filter depending on the state of the adapter. When the adapter is in the "IN_USE" state, the filter is set by transport protocols. Normally, transport protocols set the filter to receive broadcast, multicast and directed packets. When the adapter is in the "READY" state, the packet filter is set to receive only multicast and broadcast packets. This should minimize the impact on performance. An adapter in the "DISABLED" state will receive all broadcast multicast and directed packets. Once the adapter is replaced or the cable is reconnected so that the adapter can again receive packets, it is switched to the "READY" state and its packet filter is set accordingly.
Windows NT uses a registry database is to store configuration information. Each driver in Windows NT has at least one entry in the following subkey:
HKEY_LOCAL_MACHINE.backslash.SYSTEM.backslash.CurrentControlSetServices
The drivers can store configurable parameter values under the driver's subkey. NDIS drivers also store binding information inside the subkey. For a normal NDIS NIC driver, one entry is created for the NDIS miniport interface and one subkey is created for each adapter that is installed.
As discussed above, MULTISPAN is an NDIS intermediate driver which has a miniport interface on its upper edge and a transport interface on its lower edge. Each interface needs a separate subkey to describe it.
After installing the MULTISPAN driver, the installation program (oemsetup.inf) creates
HKEY_LOCAL_MACHINE.backslash.SYSTEM.backslash.CurrentControlSet.backslash.S ervices.backslash.mspan
for its NDIS transport interface and
HKEY_LOCAL_MACHINE.backslash.SYSTEM.backslash.CurrentControlSet.backslash.S ervices.backslash.mspm
for its NDIS miniport interface. It also creates
HKEY_LOCAL_MACHINE.backslash.SYSTEM.backslash.CurrentControlSet.backslash.S ervices.backslash.mspa#
for each virtual adapter installed, where # is the adapter number assigned by Windows NT. For each NIC bound to the MULTISPAN driver, a Span subkey is created under
HKEY_LOCAL_MACHINE.backslash.SYSTEM.backslash.CurrentControlSet.backslash.S ervices.backslash.NETCARD#.backslash. Parameters.
to configure how the NIC is bound to the MULTISPAN virtual adapter.
There are two entries in the Parameters subkey. "Connect" stores the name of the virtual MULTISPAN adapter to which the NIC is connected. All network interface cards belonging to the same group will have the same Connect value. "Number" stores the sequence number of the adapter. Number zero means that this adapter is the primary adapter of the adapter group. For example, the registry might resemble the following:
HKEY_LOCAL_MACHINE.backslash.SYSTEM.backslash.CurrentControlSet.backslash.S ervices.backslash.E100B1.backslash.Parameters.
Connect: REG_SZ: mspa3
Number: REG_DWORD: 0x1
The installation script also creates a Network Address under the Parameters subkey of all bound adapters. This stores the actual MAC address used for the adapter group.
The MULTISPAN driver stores configurable parameters in
HKEY_LOCAL_MACHINE.backslash.SYSTEM.backslash.CurrentControlSet.backslash.S ervices.backslash.mspa#.backslash.Parameters.
The following are values in the subkey of the REG_DWORD data type CheckTime, DisableTime, IdleTime, ProbeTime, LoadBalance. Network Address is a value in the subkey of the REG_SZ type. These values are described in detail in the following section.
There are five different parameters in the Windows NT registry which control the behavior of the MULTISPAN driver. The user can set these parameters based on the operational environment.
Check Time determines how often the MULTISPAN driver checks if the adapter is still alive. The recommended value is 1000 milliseconds (1 second). The maximum value is 1000 seconds in some embodiments.
Probe Time determines if the MULTISPAN driver should send out a probe packet if the bound adapter has not received a packet for the specified period of time. For example, if the Probe Time is set to 2000 milliseconds, the MULTISPAN driver will send out a probe packet if the adapter has not received any packets during a two second interval. If the Probe Time is set to 0, no probe packet will be sent out. The Probe Time value should be either greater than or equal to the Check Time, unless it is zero. The default value is 3000 milliseconds.
Disable Time determines when the MULTISPAN driver is to disable a bound adapter. If the adapter has not received any packets in the specified time, the MULTISPAN driver disables the adapter. The default value is 8000 milliseconds.
Idle Time determines when the MULTISPAN driver should switch to the secondary adapter if the primary adapter has not received any packets within the specified time period. The Idle Time value should be greater than the Check Time and Probe Time values. The default value is 4000 milliseconds.
FIG. 5 illustrates the structure of a probe packet for an Ethernet system in accordance with an aspect of the present invention. The packet includes a number of fields, including a destination address 700, source address 702, packet type 704 and adapter ID 706.
Since Windows NT does not allow the network address of a NIC to be changed dynamically, all the NICs that are bound to the same MULTISPAN virtual adapter are configured to the same physical address when they are loaded, which is called the MULTISPAN Virtual Network Address. Ethernet hardware addresses are 48 bits, expressed as 12 hexadecimal digits. The first 2 digits have to be 02 to represent the locally administrated address. It is recommended that 00 be used as the last two digits to support load sharing. The MULTISPAN Virtual Network Address should appear as follows:
0x02XXXXXXXX00
where XXXXXXXX are arbitrary hexadecimal numbers. This address has to be unique among single Ethernet segments.
FIG. 6 illustrates the structure of a probe packet for FDDI and token ring networks. The probe packet illustrated in FIG. 6 includes Access Control (ACS CTL) 800, Frame Control (FRM CTL) 802, destination address 804, source address 806, Destination Service Access Point (DSAP) 808, Source Service Access Point (SSAP) 810, CTL 812, protocol 814, packet type 816 and adapter ID 818.
Since an FDDI and Token-Ring networks do not allow two adapters with the same network address to coexist on the same network segment, the same mechanism described in the Ethernet section cannot be used to handle the fail-over process. The MULTISPAN driver therefore uses special FDDI and Token-Ring NIC drivers to provide the mechanism for resetting the NIC and changing the network address. On startup, only the primary adapter's address is overwritten to the MULTISPAN Virtual Network Address. All the other adapters use the address which is generated from Virtual Network Address and the adapter number assigned by NT. When the primary card has failed, MULTISPAN resets and changes the address of the primary adapter to the address generated from Virtual Network Address and its adapter number; it then resets and changes the network address of the secondary adapter to the MULTISPAN Virtual Network Address and uses that card as the primary adapter.
FDDI network addresses are 48 bits long, expressed as 12 hexadecimal digits. The first 2 digits have to be 02 to represent the address of the locally administrated station. It is recommended that 00 be used as the last two digits to support load sharing. The MULTISPAN Virtual Network Address should appear as follows:
XXXXXXXX00
where XXXXXXXX are arbitrary hexadecimal numbers. This address must be unique within a single ring segment.
FIG. 7 is a flowchart illustrating one embodiment of a method for determining whether a network adapter has failed. The network adapters are divided into a primary adapter and a plurality of secondary adapters. The method illustrated in FIG. 7
determines whether the primary adapter has failed. The method begins at state 900 which is a start state.
The system next advances to state 910 in which a packet is sent from the primary to a secondary adapter. In one embodiment, the primary sends packets to all of the secondary adapters in sequence. Next, the system advances to state 912. In state 912, the system attempts to receive a packet from the secondary adapter. The system next advances to state 914. At state 914, the system sends a packet from a secondary adapter to the primary adapter. In one embodiment, all of the secondary adapters send a packet to the primary adapter. The system next advances to state 916. At state 916, the system attempts to receive a packet from the primary adapter. The system next advances to state 918.
At state 918, the system determines whether a packet has not been received from the primary adapter or if the packet has not been received from the secondary adapter. If no packets have been received from either the primary or secondary adapter, the system assumes that the primary adapter has failed. The system then advances to step 924. At step 924, the system converts a secondary adapter to a replacement primary adapter. The system then proceeds to state 922, which is an end state. At state 918, if a packet had been received from either the primary or the secondary adapter, then the system assumes that the primary adapter has not failed and it proceeds to the end state 922.
One embodiment of the present invention operates at