Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
6292905
Wallach , ; et al.
September 18, 2001
Title
Method for providing a fault tolerant network using distributed server processes to remap clustered network resources to other servers during server failure
Abstract
The method of the current invention provides a fault tolerant access to a network resource. A replicated network directory database operates in conjunction with server resident processes to remap a network resource in the event of a server failure. The records/objects in the replicated database contain for each network resource, a primary and a secondary server affiliation. Initially, all users access a network resource through the server identified in the replicated database as being the primary server for the network resource. When server resident processes detect a failure of the primary server, the replicated database is updated to reflect the failure of the primary server, and to change the affiliation of the network resource from its primary to its backup server. This remapping occurs transparently to whichever user/client is accessing the network resource. As a result of the remapping, all users access the network resource through the server identified in the replicated database as the backup server for the resource. When the server resident processes detect a return to service of the primary server, the replicated database is again updated to reflect the resumed operation of the primary server. This remapping of network resource affiliations also occurs transparently to whichever user/client is accessing the network resource, and returns the resource to its original fault tolerant state.
Inventors:
Wallach; Walter A.
(Los Altos,
CA
)
, Findlay; Bruce
(Palo Alto,
CA
)
, Pellicer; Thomas J.
(Campbell,
CA
)
, Chrabaszcz; Michael
(Milpitas,
CA
)
Assignee:
Micron Technology, Inc.
(Boise,
ID
)
Appl. No.:
942815
Filed:
October 2, 1997
Current U.S. Class:
714/4
709/239
714/11
Field of Search:
714/4,7,11,17,1,3,6,10 709/239
U.S. Patent Documents
4057847
November 1977
Lowell et al.
4100597
July 1978
Fleming et al.
4449182
May 1984
Rubinson et al.
4672535
June 1987
Katzman et al.
4692918
September 1987
Elliott et al.
4695946
September 1987
Andreasen et al.
4707803
November 1987
Anthony, Jr. et al.
4769764
September 1988
Levanon
4774502
September 1988
Kimura
4821180
April 1989
Gerety et al.
4835737
May 1989
Herrig et al.
4894792
January 1990
Mitchell et al.
4949245
August 1990
Martin et al.
4999787
March 1991
McNally et al.
5006961
April 1991
Monico
5007431
April 1991
Donehoo, III
5033048
July 1991
Pierce et al.
5051720
September 1991
Kittirutsunetorn
5073932
December 1991
Yossifor et al.
5103391
April 1992
Barrett
5118970
June 1992
Olson et al.
5121500
June 1992
Arlington et al.
5123017
June 1992
Simpkins et al.
5136708
August 1992
Lapourtre et al.
5136715
August 1992
Hirose et al.
5138619
August 1992
Fasang et al.
5157663
October 1992
Major et al.
5210855
May 1993
Bartol
5222897
June 1993
Collins et al.
5245615
September 1993
Treu
5247683
September 1993
Holmes et al.
5253348
October 1993
Scalise
5261094
November 1993
Everson et al.
5265098
November 1993
Mattson et al.
5266838
November 1993
Gerner
5269011
December 1993
Yanai et al.
5272382
December 1993
Heald et al.
5272584
December 1993
Austruy et al.
5276814
January 1994
Bourke et al.
5276863
January 1994
Heider
5277615
January 1994
Hastings et al.
5280621
January 1994
Barnes et al.
5283905
February 1994
Saadeh et al.
5307354
April 1994
Cramer et al.
5311397
May 1994
Harshberger et al.
5311451
May 1994
Barrett
5317693
May 1994
Cuenod et al.
5329625
July 1994
Kannan et al.
5337413
August 1994
Lui et al.
5351276
September 1994
Doll, Jr. et al.
5367670
November 1994
Ward et al.
5379184
January 1995
Barraza et al.
5379409
January 1995
Ishikawa
5386567
January 1995
Lien et al.
5388267
February 1995
Chan et al.
5402431
March 1995
Saadeh et al.
5404494
April 1995
Garney
5423025
June 1995
Goldman et al.
5430717
July 1995
Fowler et al.
5430845
July 1995
Rimmer et al.
5432715
July 1995
Shigematsu et al.
5432946
July 1995
Allard et al.
5438678
August 1995
Smith
5440748
August 1995
Sekine et al.
5448723
September 1995
Rowett
5455933
October 1995
Schieve et al.
5460441
October 1995
Hastings et al.
5463768
October 1995
Schieve et al.
5465349
November 1995
Geronimi et al.
5471617
November 1995
Farrand et al.
5471634
November 1995
Giorgio et al.
5473499
December 1995
Weir
5483419
January 1996
Kaczeus, Sr. et al.
5485550
January 1996
Dalton
5485607
January 1996
Lomet et al.
5487148
January 1996
Komori et al.
5491791
February 1996
Glowny et al.
5493574
February 1996
McKinley
5493666
February 1996
Fitch
5513314
April 1996
Kandasamy et al.
5513339
April 1996
Agrawal et al.
5515515
May 1996
Kennedy et al.
5517646
May 1996
Piccirillo et al.
5519851
May 1996
Bender et al.
5526289
June 1996
Dinh et al.
5528409
June 1996
Cucci et al.
5530810
June 1996
Bowman
5533193
July 1996
Roscoe
5533198
July 1996
Thorson
5535326
July 1996
Baskey et al.
5539883
July 1996
Allon et al.
5542055
July 1996
Amini et al.
5546272
August 1996
Moss et al.
5548712
August 1996
Larson et al.
5555510
September 1996
Verseput et al.
5559764
September 1996
Chen et al.
5559958
September 1996
Farrand et al.
5559965
September 1996
Oztaskin et al.
5560022
September 1996
Dunstan et al.
5564024
October 1996
Pemberton
5566299
October 1996
Billings et al.
5566339
October 1996
Perholtz et al.
5568610
October 1996
Brown
5568619
October 1996
Blackledge et al.
5572403
November 1996
Mills
5577205
November 1996
Hwang et al.
5579487
November 1996
Meyerson et al.
5579491
November 1996
Jeffries et al.
5579528
November 1996
Register
5581712
December 1996
Herrman
5581714
December 1996
Amini et al.
5584030
December 1996
Husak et al.
5586250
December 1996
Carbonneau et al.
5588121
December 1996
Reddin et al.
5588144
December 1996
Inoue et al.
5592610
January 1997
Chittor
5592611
January 1997
Midgely et al.
5596711
January 1997
Burckhartt et al.
5598407
January 1997
Bud et al.
5602758
February 1997
Lincoln et al.
5604873
February 1997
Fite et al.
5606672
February 1997
Wade
5608865
March 1997
Midgely et al.
5608876
March 1997
Cohen et al.
5615207
March 1997
Gephardt et al.
5621159
April 1997
Brown et al.
5621892
April 1997
Cook
5622221
April 1997
Genga, Jr. et al.
5625238
April 1997
Ady et al.
5627962
May 1997
Goodrum et al.
5628028
May 1997
Michelson
5630076
May 1997
Saulpaugh et al.
5631847
May 1997
Kikinis
5632021
May 1997
Jennings et al.
5636341
June 1997
Matsushita et al.
5638289
June 1997
Yamada et al.
5644470
July 1997
Benedict et al.
5644731
July 1997
Liencres et al.
5651006
July 1997
Fujino et al.
5652832
July 1997
Kane et al.
5652833
July 1997
Takizawa et al.
5652839
July 1997
Giorgio et al.
5652892
July 1997
Ugajin
5652908
July 1997
Douglas et al.
5655081
August 1997
Bonnell et al.
5655083
August 1997
Bagley
5655148
August 1997
Richman et al.
5659682
August 1997
Devarakonda et al.
5664118
September 1997
Nishigaki et al.
5664119
September 1997
Jeffries et al.
5666538
September 1997
DeNicola
5668943
September 1997
Attanasio et al.
5668992
September 1997
Hammer et al.
5669009
September 1997
Buktenica et al.
5671371
September 1997
Kondo et al.
5675723
October 1997
Ekrot et al.
5680288
October 1997
Carey et al.
5682328
October 1997
Roeber et al.
5684671
November 1997
Hobbs et al.
5689637
November 1997
Johnson et al.
5696895
December 1997
Hemphill et al.
5696899
December 1997
Kalwitz
5696949
December 1997
Young
5696970
December 1997
Sandage et al.
5701417
December 1997
Lewis et al.
5704031
December 1997
Mikami et al.
5708775
January 1998
Nakamura
5708776
January 1998
Kikinis
5712754
January 1998
Sides et al.
5715456
February 1998
Bennett et al.
5717570
February 1998
Kikinis
5721935
February 1998
DeSchepper et al.
5724529
March 1998
Smith et al.
5726506
March 1998
Wood
5727207
March 1998
Gates et al.
5732266
March 1998
Moore et al.
5737708
April 1998
Grob et al.
5737747
April 1998
Vishlitzky et al.
5740378
April 1998
Rehl et al.
5742514
April 1998
Bonola
5742833
April 1998
Dea et al.
5747889
May 1998
Raynham et al.
5748426
May 1998
Bedingfield et al.
5752164
May 1998
Jones
5754396
May 1998
Felcman et al.
5754449
May 1998
Hoshal et al.
5754797
May 1998
Takahashi
5758165
May 1998
Shuff
5758352
May 1998
Reynolds et al.
5761033
June 1998
Wilhelm
5761045
June 1998
Olson et al.
5761085
June 1998
Giorgio
5761462
June 1998
Neal et al.
5761707
June 1998
Aiken et al.
5764924
June 1998
Hong
5764968
June 1998
Ninomiya
5765008
June 1998
Desai et al.
5765198
June 1998
McCrocklin et al.
5767844
June 1998
Stoye
5768541
June 1998
Pan-Ratzlaff
5768542
June 1998
Enstrom et al.
5771343
June 1998
Hafner et al.
5774640
June 1998
Kurio
5774645
June 1998
Beaujard et al.
5774741
June 1998
Choi
5777897
July 1998
Giorgio
5778197
July 1998
Dunham
5781703
July 1998
Desai et al.
5781716
July 1998
Hemphill, II et al.
5781744
July 1998
Johnson et al.
5781767
July 1998
Inoue et al.
5781798
July 1998
Beatty et al.
5784383
July 1998
Meaney
5784555
July 1998
Stone
5784576
July 1998
Guthrie et al.
5787019
July 1998
Knight et al.
5787459
July 1998
Stallmo et al.
5787491
July 1998
Merkin et al.
5790775
August 1998
Marks et al.
5790831
August 1998
Lin et al.
5793948
August 1998
Asahi et al.
5793987
August 1998
Quackenbush et al.
5794035
August 1999
Golub et al.
5796185
August 1998
Takata et al.
5796580
August 1998
Komatsu et al.
5796934
August 1998
Bhanot et al.
5796981
August 1998
Abudayyeh et al.
5797023
August 1998
Berman et al.
5798828
August 1998
Thomas et al.
5799036
August 1998
Staples
5799196
August 1998
Flannery
5801921
September 1998
Miller
5802269
September 1998
Poisner et al.
5802298
September 1998
Imai et al.
5802305
September 1998
McKaughan et al.
5802324
September 1998
Wunderlich et al.
5802393
September 1998
Begun et al.
5802552
September 1998
Fandrich et al.
5802592
September 1998
Chess et al.
5803357
September 1998
Lakin
5805804
September 1998
Laursen et al.
5805834
September 1998
McKinley et al.
5809224
September 1998
Schultz et al.
5809256
September 1998
Najemy
5809287
September 1998
Stupek, Jr. et al.
5809311
September 1998
Jones
5809555
September 1998
Hobson
5812748
September 1998
Ohran et al.
5812750
September 1998
Dev et al.
5812757
September 1998
Okamoto et al.
5812858
September 1998
Nookala et al.
5815117
September 1998
Kolanek
5815647
September 1998
Buckland et al.
5815651
September 1998
Litt
5815652
September 1998
Ote et al.
5821596
October 1998
Miu et al.
5822547
October 1998
Boesch et al.
5826043
October 1998
Smith et al.
5829046
October 1998
Tzelnic et al.
5835719
November 1998
Gibson et al.
5835738
November 1998
Blackledge, Jr. et al.
5838932
November 1998
Alzien
5841964
November 1998
Yamaguchi
5841991
November 1998
Russell
5845061
December 1998
Miyamoto et al.
5845095
December 1998
Reed et al.
5850546
December 1998
Kim
5852720
December 1998
Gready et al.
5852724
December 1998
Glenn, II et al.
5857074
January 1999
Johnson
5857102
January 1999
McChesney et al.
5864653
January 1999
Tavallaei et al.
5864654
January 1999
Marchant
5864713
January 1999
Terry
5867730
February 1999
Leyda
5875307
February 1999
Ma et al.
5875308
February 1999
Egan et al.
5875310
February 1999
Buckland et al.
5878237
March 1999
Olarig
5878238
March 1999
Gan et al.
5881311
March 1999
Woods
5884027
March 1999
Garbus et al.
5884049
March 1999
Atkinson
5886424
March 1999
Kim
5889965
March 1999
Wallach et al.
5892898
April 1999
Fujii et al.
5892915
April 1999
Duso et al.
5892928
April 1999
Wallach et al.
5893140
April 1999
Vahalia et al.
5898846
April 1999
Kelly
5898888
April 1999
Guthrie et al.
5905867
May 1999
Giorgio
5907672
May 1999
Matze et al.
5909568
June 1999
Nason
5911779
June 1999
Stallmo et al.
5913034
June 1999
Malcolm
5922060
July 1999
Goodrum
5930358
July 1999
Rao
5935262
August 1999
Barrett et al.
5936960
August 1999
Stewart
5938751
August 1999
Tavallaei et al.
5941996
August 1999
Smith et al.
5964855
October 1999
Bass et al.
5983349
November 1999
Kodama et al.
5987554
November 1999
Liu et al.
5987621
November 1999
Duso et al.
5987627
November 1999
Rawlings, III
6012130
January 2000
Beyda et al.
6038624
March 2000
Chan et al.
Foreign Patent Documents
0 866 403 A1
Sep., 1998
EP
04 333 118 A
Nov., 1992
JP
05 233 110 A
Sep., 1993
JP
07 093 064 A
Apr., 1995
JP
07 261 874 A
Oct., 1995
JP
Other References
Shanley and Anderson, PCI System Architecture, Third Edition, Chapters 15 & 16, pp. 297-328, CR 1995. .
PCI Hot-Plug Specification, Preliminary Revision for Review Only, Revision 0.9, pp. i-vi, and 1-25, Mar. 5, 1997. .
SES SCSI-3 Enclosure Services, X3T10/Project 1212-D/Rev 8a, pp. i, iii-x, 1-76, and I-1 (index), Jan. 16, 1997. .
Compaq Computer Corporation, Technology Brief, pp. 1-13, Dec. 1996, "Where Do I Plug the Cable? Solving the Logical-Physical Slot Numbering Problem." .
ftp.cdrom.com/pub/os2/diskutil/, PHDX software, phdx.zip download, Mar. 1995, "Parallel Hard Disk Xfer." .
Cmasters, Usenet post to microsoft.public.windowsnt.setup, Aug. 1997, "Re: FDISK switches." .
Hildebrand, N., Usenet post to comp.msdos.programmer, May 1995, "Re: Structure of disk partition into." .
Lewis, L., Usenet post to alt.msdos.batch, Apr. 1997, "Re: Need help with automating FDISK and FORMAT." .
Netframe, http://www.netframe-support.com/technology/datasheets/data.htm, before Mar. 1997, "Netframe ClusterSystem 9008 Data Sheet." .
Simos, M., Usenet post to comp.os.msdos.misc, Apr. 1997, "Re: Auto FDISK and FORMAT." .
Wood, M. H., Usenet post to comp.os.netware.misc, Aug. 1996, "Re: Workstation duplication method for WIN95." .
Lyons, Computer Reseller News, Issue 721, pp. 61-62, Feb. 3, 1997, "ACC Releases Low-Cost Solution for ISPs." .
M2 Communications, M2 Presswire, 2 pages, Dec. 19, 1996, "Novell IntranetWare Supports Hot Pluggable PCI from NetFRAME." .
Rigney, PC Magazine, 14(17): 375-379, Oct. 10, 1995, "The One for the Road (Mobile-aware capabilities in Windows 95)." .
Shanley, and Anderson, PCI System Architecture, Third Edition, p. 382, Copyright 1995. .
Gorlick, M., Conf. Proceedings: ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 175-181, 1991, "The Flight Recorder: An Architectural Aid for System Monitoring." .
IBM Technical Disclosure Bulletin, 92A+62947, pp. 391-394, Oct. 1992, Method for Card Hot Plug Detection and Control. .
Davis, T, Usenet post to alt.msdos.programmer, Apr. 1997, "Re: How do I create an FDISK batch file?" .
Davis, T., Usenet post to alt.msdos.batch, Apr. 1997, "Re: Need help with automating FDISK and FORMAT . . . ". .
NetFrame Systems Incorporated, Doc. No. 78-1000226-01, pp. 1-2, 5-8, 359-404, and 471-512, Apr. 1996, "NetFrame Clustered Multiprocessing Software: NW0496 DC-ROM for Novel.RTM. NetWare.RTM. 4.1 SMP, 4.1, and 3.12." .
Shanley, and Anderson, PCI System Architecture, Third Edition, Chapter 15, pp. 297-302, Copyright 1995, "Intro To Configuration Address Space." .
Shanley, and Anderson, PCI System Architecture, Third Edition, Chapter 16, pp. 303-328, Copyright 1995, "Configuration Transactions." .
Sun Microsystems Computer Company, Part No. 802-5355-10, Rev. A, May 1996, "Solstice SyMON User's Guid." .
Sun Microsystems, Part No. 802-6569-11, Release 1.0.1, Nov. 1996, "Remote Systems Diagnostics Installation & User Guide." .
Haban, D. & D. Wybranietz, IEEE Transaction on Software Engineering, 16(2):197-211, Feb. 1990, "A Hybrid Monitor for Behavior and Performance Analysis of Distributed Systems.".~
Primary Examiner:
Lee; Thomas
Assistant Examiner:
Nguyen; Nguyen
Attorney, Agent or Firm:
Knobbe, Martens, Olson & Bear, LLP
Parent Case Text
PRIORITY
The benefit under 35 U.S.C. .sctn.119(e) of the following U.S. Provisional Application entitled "Clustering Of Computer Systems Using Uniform Object Naming And Distributed Software For Locating Objects," application Ser. No. 60/046,327, filed on May 13, 1997, is hereby claimed.
Claims
What is claimed is:
1. A method for fault tolerant access to a network resource, on a network with a client workstation and a first and a second server, said method for fault tolerant access comprising the acts of:
selecting a first server to provide communications between a client workstation and a network resource;
detecting a failure of the first server, comprising the acts of:
monitoring across a common bus, at a second server, communications between the first server and the network resource across the common bus by noting a continual change in state of the network resource, and
observing a termination in the communications between the first server and the network resource across the common bus by noting a stop in the continual change in state of the network resource; and
routing communications between the client workstation and the network resource via the second server.
2. The method for fault tolerant access to a network resource of claim 1, further comprising the acts of:
identifying in a first record, the primary server for the network resource as the first server;
discovering a recovery of the first server; and
re-routing communications between the client workstation and the network resource via the first server.
3. The method for fault tolerant access to a network resource of claim 2, further comprising
providing a network resource database; and
replicating the network resource database on the first and the second servers.
4. The method for fault tolerant access to a network resource of claim 2, wherein said act of detecting a recovery of the first server, further includes the acts of:
sending packets intermittently from the second server to the first server; and
re-acquiring acknowledgments from the first server at the second server, the acknowledgments responsive to said sending act and to the recovery of said first server.
5. The method for fault tolerant access to a network resource of claim 1, further comprising:
choosing the first server as the primary server and the second server as the backup server, for the network resource; and
storing in a first field of the first record the primary server for the network resource and sorting in a second field of the first record the backup server for the network resource.
6. The method for fault tolerant access to a network resource of claim 5, wherein said choosing act, includes the act of:
allowing a network administrator to select the primary and the backup server.
7. The method for fault tolerant access to a network resource of claim 5, wherein said act of detecting a failure of the first server, further includes the acts of:
reading the second field in the first record of the network resource database;
determining on the basis of said reading act that the second field identifies the backup server for the network resource as the second server;
activating the monitoring by the second server of the first server, in response to said determining act; and
ascertaining at the second server a failure of the first server.
8. The method for fault tolerant access to a network resource of claim 6, wherein said act of ascertaining at the second server a failure of the first server, further includes the acts of:
sending packets intermittently from the second server to the first server;
receiving acknowledgments from the first server at the second server, the acknowledgments responsive to said sending act; and
noticing a termination in the receipt of acknowledgments from the first server.
9. The method for fault tolerant access to a network resource of claim 5, wherein said act of recognizing the backup server for the network resource, further includes the acts of:
reading the second field in the first record of the network resource database; and
determining that the second field identifies the backup server for the network resource as the second server.
10. A program storage device encoding instructions for:
causing a computer to provide a network resource database, the database including individual records corresponding to network resources, and the network resource database including a first record corresponding to the network resource and the first record identifying a primary server for the network resource as a first server;
causing a computer to select, on the basis of the first record, the first server to provide communications between a client workstation and the network resource;
causing a computer to recognize the backup server for the network resource as the second server;
causing a computer to detect a failure of the first server, including:
causing a computer to monitor across a common bus, at the second server, communications between the first server and the network resource across the common bus by noting a continual change in state of the network resource, and
causing a computer to observe a termination in the communications between the first server and the network resource across the common bus by noting a stop in the continual change in state of the network resource; and
causing a computer to route communications between the client workstation and the network resource via the second server, responsive to said recognizing and detecting acts.
11. The program storage device of claim 10, further comprising instructions for:
causing a computer to identify in the first record, the primary server for the network resource as the first server;
causing a computer to discover a recovery of the first server; and
causing a computer to re-route communications between the client workstation and the network resource via the first server, responsive to said identifying and discovering acts.
12. The program storage device of claim 11, further including instructions for:
causing a computer to replicate a network resource database on the fist and the second server.
13. The program storage device of claim 11, wherein said instructions for causing a computer to detect a recovery of the first server, further includes instructions for:
causing a computer to send packets intermittently from the second server to the first server; and
causing a computer to acquire acknowledgments from the first server at the second server, the acknowledgments responsive to said sending act and to the recovery of the first server.
14. The program storage device of claim 13, further including instructions for:
causing a computer to choose the first server as the primary server and the second server as the backup server, for the network resource; and
causing a computer to store in a first field of the first record the primary server for the network resource and storing in a second field of the first record the backup server for the network resource.
15. The program storage device of claim 14, wherein said instructions for causing a computer to choose, further include:
causing a computer to allow a network administrator to select the primary and the backup server.
16. The program store device of claim 14, wherein said instructions for causing a computer to detect a failure of the first server further include instructions for:
causing a computer to read the second field in the first record of the network resource database;
causing a computer to determine on the basis of said reading act that the second field identifies the backup server for the network resource as the second server;
causing a computer to activate the monitoring by the second server of the first server, in response to said determining act; and
causing a computer to ascertain at the second server a failure of the first server.
17. The program storage device of claim 16, wherein said instructions for causing a computer to ascertain at the second server a failure of the first server, further includes instructions for:
causing a computer to send packets intermittently from the second server to the first server;
causing a computer to receive acknowledgments from the first server at the second server, the acknowledgments responsive to said sending act; and
causing a computer to notice a termination in the receipt of acknowledgments from the first server.
18. The program storage device of claim 14, wherein said instructions for causing a computer to recognize the backup server for the network resource, further include instructions for:
causing a computer to read the second field in the first record of the network resource database; and
causing a computer to determine on the basis of said reading act that the second field identifies the backup server for the network resource as the second server.
19. A method for providing fault tolerant access to a network resource, on a network with a client workstation and a first and a second server and a network resource database, wherein the network resource database includes a first record corresponding to a network resource and the first record includes a first field containing the name of the network resource and a second field containing the host server affiliation of the network resource; said method for fault tolerant access comprising the acts of:
expanding the network resource database to include a third field for naming the primary server affiliation for the network resource and a fourth field for naming the backup server affiliation for the network resource;
naming the first server in the third field;
selecting, on the basis of the first record, the first server to provide communications between the client workstation and the network resource;
naming the second server in the fourth field;
recognizing, on the basis of the fourth field of the first record, the backup server for the network resource as the second server;
detecting a failure of the first server, including the acts of:
monitoring across a common bus, at the second server, communications between the first server and the network resource across the common bus by noting a continual change in state of the network resource, and
observing a termination in the communications between the first server and the network resource across the common bus by noting a stop in the continual change in state of the network resource; and
routing communications between the client workstation and the network resource via the second server, responsive to said recognizing and detecting acts.
20. The method for fault tolerant access to a network resource of claim 19, further comprising the acts of:
monitoring the server named in the third field;
discovering a recovery of the server named in the third field; and
re-routing communications between the client workstation and the network resource via the first server, responsive to said monitoring and discovering acts.
21. The method for fault tolerant access to a network resource of claim 20, wherein said act of discovering a recovery of the first server, further includes the acts of:
sending packets intermittently from the second server to the first server; and
re-acquiring acknowledgments from the first server at the second server, the acknowledgments responsive to said sending act and to the recovery of said first server.
22. The method for fault tolerant access to a network resource of claim 19, wherein said naming acts, include the acts of:
allowing a network administrator to name the primary server affiliation and the backup server affiliation in the third and fourth fields of the first record of the network resource database.
23. The method for fault tolerant access to a network resource of claim 19, wherein said act of detecting a failure of the first server, further includes the acts of sending
sending packets intermittently from the second server to the first server;
receiving acknowledgments from the first server at the second server, the acknowledgments responsive to said sending act; and
noticing a termination in the receipt of acknowledgments from the first server.
24. A computer usable medium having computer readable program code means embodied therein for causing fault tolerant access to a network resource on a network with a client workstation and a first and second server, and a network resource database, wherein the network resource database includes a first record corresponding to a network resource and the first record includes a first field containing the name of the network resource and a second field containing the host server affiliation of the network resource; the computer readable program code means in said article of manufacture comprising;
computer readable program code means for causing a computer to expand the network resource database to include a third field for naming the primary server affiliation for the network resource and a fourth field for naming the backup server affiliation for the network resource;
computer readable program code means for causing a computer to name the first server in the third field;
computer readable program code means for causing a computer to select, on the basis of the first record, the first server to provide communications between the client workstation and the network resource;
computer readable program code means for causing a computer to name the second server in the fourth field;
computer readable program code means for causing a computer to recognize, on the basis of the fourth field of the first record, the backup server for the network resource as the second server;
computer readable program code means for monitoring across a common bus, at a second server, communications between the first server and the network resource across the common bus by noting a continual change in state of the network resource, and observing a termination in the communications between the first server and the network resource across the common bus by noting a stop in the continual change in state of the network resource; and
computer readable program code means for causing a computer to route communications between the client workstation and the network resource via the second server, responsive to said recognizing and detecting acts.
25. The computer readable program code means in said article of manufacture of claim 24, further comprising:
computer readable program code means for causing a computer to monitor the server named in the third field;
computer readable program code means for causing a computer to discover a recovery of the server named in the third field; and
computer readable program code means for causing a computer to re-route communications between the client workstation and the network resource via the first server, responsive to said monitoring and discovering acts.
26. The computer readable program code means in said article of manufacture of claim 25, wherein said computer readable program code means for causing a computer to discover a recovery, further includes:
computer readable program code means for causing a computer to send packets intermittently from the second server to the first server; and
computer readable program code means for causing a computer to re-acquire acknowledgments from the first server at the second server, the acknowledgments responsive to said sending act and to the recovery of said first server.
27. The computer readable program code means in said article of manufacture of claim 24, wherein said computer readable program code means for causing a computer to name, further includes:
computer readable program code means for causing a computer to allow a network administrator to name the primary server affiliation and the backup server affiliation in the third and fourth fields of the first record of the network resource database.
28. The computer readable program code means in said article of manufacture of claim 24, wherein said computer readable program code means for causing a computer to detect a failure, further includes:
computer readable program code means for causing a computer to send packets intermittently from the second server to the first server;
computer readable program code means for causing a computer to receive acknowledgments from the first server at the second server, the acknowledgments responsive to said sending act; and
computer readable program code means for causing a computer to notice a termination in the receipt of acknowledgments from the first server.
Description
APPENDICES
Appendix A, which forms a part of this disclosure, is a list of commonly owned copending U.S. patent applications. Each one of the applications listed in Appendix A is hereby incorporated herein in its entirety by reference thereto.
Appendix B, which forms part of this disclosure, is a copy of the U. S. provisional patent application filed May 13, 1997, entitled "Clustering Of Computer Systems Using Uniform Object Naming And Distributed Software For Locating Objects," and assigned application Ser. No. 60/046,327.
COPYRIGHT AUTHORIZATION
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
1. Field of Invention
The present invention pertains to computer networks. More particularly, this invention relates to improving the ability of a network to route around faulty components.
2. Description of Related Art
As computer systems and networks become more complex, various systems for promoting fault tolerance have been devised. To prevent network down-time due to power failure, uninterrupted power supplies (UPS) have been developed. A UPS is basically a rechargeable battery to which a workstation or server is connected. In the event of a power failure the workstation or server is maintained in operation by the rechargeable battery until such time as the power resumes.
To prevent network down-time due to failure of a storage device, data mirroring was developed. Data mirroring provides for the storage of data on separate physical devices operating in parallel with respect to a file server. Duplicate data is stored on separate drives. Thus, when a single drive fails the data on the mirrored drive may still be accessed.
To prevent network down-time due to a print/file server, server mirroring has been developed. Server mirroring as it is currently implemented requires a primary server and storage device, a backup server and storage device, and a unified operating system linking the two. An example of a mirrored server product is the Software Fault Tolerance level 3 (SFT III) product by Novell Inc., 1555 North Technology Way, Orem, Utah, as an add-on to its NetWare.RTM. 4.x product. SFT III maintains servers in an identical state of data update. It separates hardware-related operating system (OS) functions on the mirrored servers so that a fault on one hardware platform does not affect the other. The server OS is designed to work in tandem with two servers. One server is designated as a primary server, and the other is a secondary server. The primary server is the main point of update; the secondary server is in a constant state of readiness to take over. Both servers receive all updates through a special link called a mirrored server link (MSL), which is dedicated to this purpose. The servers also communicate over the local area network (LAN) that they share in common, so that one knows if the other has failed even if the MSL has failed. When a failure occurs, the second server automatically takes over without interrupting communications in any user-detectable way. Each server monitors other server's NetWare Core Protocol (NCP) acknowledgments over the LAN to see that all the requests are serviced and that OSs are constantly maintained in a mirrored state.
When the primary server fails, the secondary server detects the failure and immediately takes over as the primary server. The failure is detected in one or both of two ways: the MSL link generates an error condition when no activity is noticed, or the servers communicate over the LAN, each one monitoring the other's NCP acknowledgment. The primary server is simply the first server of the pair that is brought up. It then becomes the server used at all times and it processes all requests. When the primary server fails, the secondary server is immediately substituted as the primary server with identical configurations. The switch-over is handled entirely at the server end, and work continues without any perceivable interruption.
Power supply backup, data mirroring, and server mirroring all increase security against down time caused by a failed hardware component, but they all do so at considerable cost. Each of these schemes requires the additional expense and complexity of standby hardware, that is not used unless there is a failure in the network. Mirroring, while providing redundancy to allow recovery from failure, does not allow the redundant to be used to improve cost/performance of the network.
What is needed is a fault tolerant system for computer networks that can provide all the functionality of UPS, disk mirroring, or server mirroring without the added cost and complexity of standby/additional hardware. What is needed is a fault tolerant system for computer networks which smoothly interfaces with existing network systems.
SUMMARY OF THE INVENTION
In an embodiment of the invention, the method comprises the acts of:
providing a network resource database, the database includes individual records corresponding to clustered network resources, and for the clustered resources, the database includes a first record corresponding to the network resource, and the first record identifies the primary server for the network resource as the first server;
selecting, on the basis of the first record, the first server to provide service to the client workstation with respect to that clustered network resource;
recognizing the backup server for that clustered network resource as the second server;
detecting a failure of the first server; and
routing communications between the client workstation and the network resource via the second server, responsive to the recognizing and detecting acts.
In another embodiment of the invention, the method comprises the additional acts of:
identifying in the first record, the primary server for the network resource as the first server;
discovering a recovery of the first server; and
re-routing communications between the client workstation and the network resource via the first server, responsive to the identifying and discovering acts.
DESCRIPTION OF FIGURES
FIG. 1 is a hardware block diagram of a prior art network with replicated database for tracking network resources.
FIG. 2 is a functional block diagram of the replicated database used in the prior art network shown in FIG. 1.
FIG. 3 is a hardware block diagram showing a network with server resident processes for providing fault tolerant network resource recovery in accordance with the current invention.
FIG. 4 is a detailed block diagram of an enhanced network resource object definition which operates in conjunction with the server resident processes of FIG. 3.
FIGS. 5A-E are hardware block diagrams showing detection, fail-over and fail-back stages of the current invention for a storage device connected to a primary and backup server.
FIGS. 6A-E show the object record for the storage device of FIG. 5 on both the primary and secondary server during the stages of detection, fail-over and fail-back.
FIG. 7 is a functional block diagram showing the processing modules of the current invention on a server.
FIG. 8A-C are process flow diagrams showing the authentication, detection, fail-over, recovery detection, and fail-back processes of the current invention.
DETAILED DESCRIPTION
The method of the current invention provides a fault tolerant network without hardware mirroring. The invention involves an enhanced replicated network directory database which operates in conjunction with server resident processes to remap network resources in the event of a server failure. In some embodiments, the enhanced network directory database is replicated throughout all servers in the cluster. The records/objects in the enhanced database contain for at least 1 clustered resource, a primary and a secondary server affiliation. Initially, all users access a clustered resource through the server identified in the enhanced database as being the primary server for that clustered resource. When server resident processes detect a failure of the primary server the enhanced database is updated to reflect the failure of the primary server, and to change the affiliation of the resource from its primary to its backup server. The updating and remapping is accomplished by server resident processes which detect failure of the primary server, and remap the network resource server affiliation. This remapping occurs transparently to whichever user/client is accessing the resource. Thus, network communications are not interrupted and all users access a resource through its backup server, while its primary server is out of operation. This process may be reversed when the primary server resumes operation, thereby regaining fault tolerant, i.e., backup capability.
No dedicated redundant resources are required to implement the current invention. Rather, the current invention allows server resident processes to intelligently reallocate servers to network resources in the event of server failure.
FIG. 1 is a hardware block diagram of a prior art enterprise network comprising LAN segments 50 and 52. LAN segment 50 comprises workstations 70-72, server 58, storage device 82, printer 86 and router 76. Server 58 includes display 64. LAN segment 52 includes workstations 66 and 68, servers 54 and 56, storage devices 78 and 80, printer 84 and router 74. Server 56 includes display 62. Server 54 includes display 60. All servers in the cluster, and substantially all operating components of each clustered server, are available to improve the cost/performance of the network.
LAN segment 50 includes a LAN connection labeled "LAN-2" between workstations 70 and 72, server 58 and router 76. Storage device 82 and printer 86 are connected to server 58 and are available locally as network resources to either of workstations 70 and 72. LAN segment 50 is connected by means of router 76 to LAN segment 52 via router 74. LAN segment 52 includes a LAN connection labeled LAN-1 between workstations 66 and 68, servers 54-56 and router 74. Storage device 78 and printer 84 are connected to server 54. Storage device 80 is connected to server 56. Either of workstations 66 and 68 can connect via server 54 locally to printer 84 and storage device 78. Either of workstations 66 and 68 can connect locally via server
56 to storage device 80.
Each of the servers 54, 56 and 58 includes respectively copies 88A, 88B, and 88C of the replicated network directory database. Such a replicated network directory database is part of the NetWare Directory Services (NDS), is provided in Novell's NetWare 4.x product. The format and functioning of this database, which is a foundation for the current invention is described in greater detail in FIG. 2.
FIG. 2 is a detailed block diagram of replica 88A of the prior art replicated NetWare.RTM. Directory Services (NDS) database, such as is part of NetWare.RTM. 4.x. The replicated network directory database includes a directory tree 102 and a series of node and leaf objects within the tree. Leaf object 104 is referenced. In NDS physical devices are represented by objects or logical representations of physical devices. Users are logical user accounts and are one type of NDS object. The purpose of object oriented design is to mask the complexity of the physical configuration of the network from users and administrators. A good example of a logical device is a file server. A file server is actually a logical device, such as a NetWare.RTM. server operating system running on a physical device, a computer. A file system, is the logical file system represented in the file server's memory and then saved on the physical hard drive. In NDS, a file server is a type of object and so is a NetWare.RTM. volume. Objects represent physical resources, users and user related entities such as groups. Object properties are different types of information associated with an object. Property values are simply names and descriptions associated with the object properties. For example, HP3 might be the property value for the printer name object property which is in turn associated with the printer object.
NDS uses a hierarchical tree structure to organize the various objects. Hence, the structure is referred to as the NDS tree. The tree is made up of these three types of objects: the root object, container objects and leaf objects. The location in which objects are placed in a tree is called a context or name context similar to a pointer inside a database. The context is of key importance; to access a resource the user object must be in the same context as the resource object. A user object has access to all objects that lie in the same directory and in child directories. The root object is the top of a given directory tree. Branches are made of contained objects. Within them are leaf objects. A crude analogy is the directory tree of your hard disk. There is a back slash or root at the base of the tree, each subdirectory can be prepared to a container object and files within the subdirectories can be compared to leaf objects in NDS. The root object is created automatically when you first install NDS. It cannot be renamed or deleted and there is only one root in a given NDS tree. Container objects provide a way to logically organize other objects in the NDS tree. A container object can house other container objects within it. The top container is called the parent object. Objects contained in a container object are said to be child objects. There are three types of parents or containers: organization, organizational unit and country. You must have at least one organization object within the NDS tree and it must be placed one level below root. The organization object is usually used to denote a company. Organizational units are optional. If you use them, they are placed one level below an organization object. Leaf objects include user, group, server, volume, printer, print queue and print server. Associated with each object is a set of object rights.
The directory tree is a distributed database that must make information available to user's connected to various parts of the network. Novell directory services introduces two new terms, partition and replica, to describe the mechanics of how the directory tree is stored. The directory tree is divided into partitions. A partition is a section of the database under an organization unit container object such as marketing. Novell directory services divides the network directory tree into partitions for several reasons. A replica is a copy of a directory tree partition. Having replicas of directory tree partitions on multiple servers has both good and bad points. Each replica must contain information that is synchronized with every corresponding replica. A change to one partition replica must be echoed to every other replica. The more replicas, the more time the network traffic is involved in keeping replicas synchronized.
The directory tree 102 in FIG. 2 comprises a root node, a company node, organizational nodes, and leaf nodes. The company node is directly beneath the root node. Beneath the company node are two organizational unit nodes labeled Accounting and Marketing. Beneath each organizational unit nodes are a series of leaf nodes representing those network resources associated with the Accounting unit and the Marketing unit. The organizational unit labeled Accounting has associated with it the following leaf nodes: server 58, storage device 82, printer 86, and users A and B [See FIG. 1]. The organizational unit labeled Marketing has associated with it, the following leaf nodes: server 56, server 54, storage device 80, storage device 78, printer 84, and users C and D [See FIG. 1].
Each object has specific object properties and property values. As defined by Novell in their NetWare.RTM. 4.x, a volume object has three object properties: context, resource name, and server affiliation. Context refers to the location in the directory tree of the object and is similar in concept to a path statement. For example, printer 86 is a resource available to the Accounting Department, and not to the Marketing Department. The next object property is resource name. The resource name is a unique enterprise wide identifier for the resource/object. The next object property is host server affiliation. Host server affiliation is an identifier of the server through which the object/resource may be accessed.
The object/resource record 104 for storage device 80 is shown in FIG. 2. The object/resource includes a context property 106, a resource name property 108, and host server affiliation property 110. The context property has a context property value 106A of: [Root].backslash.Company.backslash.Marketing.backslash.. The resource name property value 108A is: RAID-80. The host server affiliation property value 110A is server 56. A network administrator may add or delete objects from the tree. The network administrator may alter object property values. As discussed above, any changes made in the directory of one server are propagated/replicated across all servers in the enterprise.
Enhanced Directory+Server Processes
FIG. 3 is a hardware block diagram of an embodiment of the current invention in a local area network. Users A-D are shown interfacing via workstations 66-72 with network resources. The network resources include servers 54-58, storage devices
78-82 and printers 84-86, for example. The relationship between network resources is defined not only as discussed above in connection with FIGS. 1-2 for normal operation, but also for operation in the event of a failure of any one of the network resources. This additional utility, i.e., fault tolerance is a result of enhancements to the network directory databases 150A-C and processes 152A-C resident on each server. The server resident processes operating in conjunction with the enhanced network directory database allow failure detection, resource/object remapping and recovery. Thus, network downtime is reduced by transparently remapping network resources in response to a detection of a failure. The remapped route is defined within the enhanced directory. The routes that are defined in the directory may be part of the initial administrative setup; or may be a result of an automatic detection process; or may be a result of real time arbitration between servers. The server resident processes 152A-C have the additional capability of returning the resource/network to its initial configuration when the failed resource has been returned to operation. This latter capability is also a result of the interaction between the host resident processes 152A-C and the enhanced network directory 150A-C.
FIG. 4 is a detailed block diagram of the enhanced object/resource properties provided within the enhanced network directory database. Object 200A within enhanced network directory database 150A is shown. Object 200A contains object properties and property values for the storage device RAID 80 shown in FIG. 3. In addition to the three prior art object properties, i.e., context property 106, resource name property 108 and host server affiliation property 110, additional properties are defined for the object in the enhanced network directory database. Property 210 is the primary server affiliation for the resource, i.e., RAID-80. The property value 210A for the primary server affiliation is server 56. Thus, server 56 will normally handle network communications directed to RAID-80. Property 212 is the backup server affiliation for the resource. The property value 212A for the backup server affiliation is server 54. Thus server 54 will handle network communications with RAID-80 in the event of a failure of server 56. Cluster property 214 indicates whether the resource is cluster capable, i.e., can be backed up. The values for this property are boolean True or False value. The cluster property value in FIG. 4 is boolean True which indicates that RAID 80 has physical connections to more than one server, i.e., servers 54-56. Enable property 216 indicates whether a cluster capable object will be cluster enable, i.e., will be given a backup affiliation. The property values associated with this property are boolean True or False. The enable property value 216A of boolean TRUE indicates that RAID 80 is cluster capable and that clustering/backup capability has been enabled. The optional auto-recover property 218 indicates whether the cluster enabled object is to be subject to automatic fail back and/or auto recovery. The auto recover property has the property values of boolean True or False. The auto-recover property value 218A is TRUE which indicates that RAID 80 will fail back without user confirmation to server 56 when server 56 recovers. Prior state property 220 indicates the prior state of the resource. The property values associated with this are: OK, fail over in progress, fail over complete, fail back in process, fail back complete. The prior state property value 220A of OK indicates that this resource RAID 80 has not failed. The priority property value 222 indicates the priority for fail over. The priority property may have values of 1, 2 or 3. This property may be utilized to stage the fail over of multiple resources where the sequencing of recovery is critical. The priority property value 222A of "2" indicates that RAID 80 will have an intermediate staging in a recovery sequence. The hardware ID property value 224 is the unique serial identifier associated with each hardware resource. The hardware ID property value 224A of :02:03:04:05 indicates that RAID 80 is comprised of four volumes, each with their own unique serial number identifier. Any object in the enhanced network directory database may be cluster/backed-up. Therefore the methods of certain embodiments of the current invention are equally applicable to printers, print queues, and databases as to storage devices.
FIGS. 5A-E show a sequence of detection, fail-over and fail-back for storage device 80. Storage device 80 may be affiliated physically with either one of two servers. FIG. 5A includes workstations 66-68, servers 54-56, storage devices 78-80, printer 84 and router 74. Server 54 includes a display 60. Server 56 includes a display 62. Workstations 66 and 68 are connected via LAN-1 to router 74 and servers 54 and 56. Server 56 is directly connected to storage device 80. Server 54 is directly connected to printer 84 and storage device 78. A connection 250 also exists between servers 54-56 and storage devices 78-80. Servers 54-56 each contain replicas respectively, 150B-A of the enhanced network directory database. Server 54 runs process 152B for detection, fail-over and fail-back. Server 56 runs process 152A for detection, fail-over and fail-back. In FIG. 5A, server 56 has a primary relationship with respect to storage device 80. This relationship is determined by the object properties for storage device 80 in the replicated network directory database [see FIG. 4]. Communication 252 flows between RAID 80 and server 56. In the example shown, workstation 68 is communicating via server 56 with storage device 80. Server 54
has a primary relationship with storage device 78 as indicated by communication marker 254. This relationship is determined by the object properties for storage device 78 in the replicated network directory database.
FIG. 5B shows an instance in which server 56 and the process resident thereon has failed, as indicated by the termination marks 256 and 258. Communications via server 56 between workstation 68 and storage device 80 are terminated.
As shown in FIG. 5C, process 152B running on server 54 has detected the failure of server 56 and has remapped communications between workstation 68 and storage device 80 via server 54. This remapping is the result of the process 152B running on server 54. These processes have detected the failure of server 56. They have determined on the basis of backup property values for storage device 80 stored in the enhanced network directory database 150B that server 54 can provide backup capability for storage device 80. Finally, they have altered the property values on the object/record for storage device 80 within the enhanced network directory database to cause communications with the storage device 80 to be re-routed through server 54.
FIG. 5D indicates that server 56 has resumed normal operation. Server 56's replica 262 of the enhanced network directory database is stale or out of synchronization with the current replica 150B contained in server 54. The replica 262 is updated to reflect the changes in the server-to-storage device configuration brought about by the processes running on server 54. Communications between workstation 68 and storage device 80 continue to be routed via server 54, as indicated by communication marker 260, because that server is listed on all replicas of the enhanced network directory database as the host server for resource/object identified as storage device 80.
In FIG. 5E updated replicas 150A-B of the network directory database are present on respectively servers 56-54. Processes 152A-B running on respectively servers 56-54 have caused the network to be reconfigured to its original architecture in which server 56 is the means by which workstation 68, for example, communicates with storage device 80. This fail-back is a result of several acts performed cooperatively by processes 152A-B. Process 152B detects re-enablement of server 56. Process
152B relinquishes ownership of storage device 80 by server 54. Process 152A running on server 56, detects relinquishment of ownership of storage device 80 by server 54 and in response thereto updates the host server property value for resource/object storage device 80 in the replicated network directory database. Communications between workstation 68 and storage device 80 are re-established via server 56, as indicated by communication marker 252. All of these processes may take place transparently to the user, so as not to interrupt network service. During the period of fail-over, server 54 handles communications to both storage device 78 as well as storage device 80.
FIGS. 6A-6E shows the object properties and property values for storage device 80, during the events shown at a hardware level in FIGS. 5A-E. These object properties and property values are contained in replicas 150A-B of the enhanced network directory database and specifically stored in servers 56-54. In FIGS. 6A-E, the object property values for storage device 80 are shown. On the left-hand side of each figure the object/record that is stored in server 56 is shown. On the right-hand side of each figure the object/record that is stored in server 54 is shown. The enhanced directory replicated on each server contains multiple objects representing all network resources. The enhanced directory on server 56 is shown to contain an additional object 202A and the enhanced directory in server 54 is shown to contain an additional object 202B, representative of the many objects within each enhanced directory database.
FIG. 6A shows an initial condition in which the host server property variable 110A/B and the primary server property value 210A/B match. In FIG. 6B, server 54 has failed and therefore the replica of the enhanced network directory database and each of the objects within that directory are no longer available as indicated by failure mark 258. Nevertheless, an up-to-date current replica of the enhanced network directory database is still available on server 56 as indicated by objects 200B and
202B on the right-hand side of FIG. 6B. In FIG. 6C, the fail-over corresponding to that discussed above in connection with FIG. 5C is shown. The host server property value 110B has been updated to reflect current network routing during the failure of server 56. The host server property value 110B is server 54. Because the resource/object for storage device 80 appears on all servers as server 54, all communications between workstations, i.e., workstation 68 are re-routed through server 54 to storage device 80. The fail-over is accomplished by resident process 152B on server 54 [see FIGS. 5A-E]. These processes detect the failure of server 56. Then they determine which server is listed in the resource/object record for storage device 80 as a backup. Next the processes write the backup property value to the host property value for storage device 80. Replicas of this updated set of property value(s) for object 200B, corresponding to the storage device 80, are then replicated throughout the network. As indicated in FIG. 6C, the prior state property value 220B is updated by the resident process 152B [see FIGS. 5A-E] to indicate that a fail-over has taken place.
As indicated in FIG. 6D, when server 54 first comes back on line it contains a stale, i.e., out of date copy of the property values for all objects including object 200A corresponding to storage device 80. The existing functionality of NetWare.RTM. 4.x causes this stale enhanced directory to be refreshed with a current property values generated, in this instance, by the resident process 152B on server 54. [see FIGS. 5A-E]
FIG. 6E indicates the completion of a fail-back. The resident process 152B [see FIGS. 5A-E] on server 54 detects resumption of operation by server 56. These processes then relinquish ownership of storage device 80. Then the resident process
152B [see FIGS. 5A-E] on server 54 asserts ownership of storage device 80 and rewrite the host server property value 110B for storage device 80 to correspond to the primary server property value 210A, which is server 56. This updated record/object is again replicated in all the replicated network directory databases throughout all the servers on the network. Thus, all communications between workstations, i.e., workstation 68 and storage device 80 are routed through server 56. [see FIGS. 5A-E] The prior state property values 220A-B are set to fail-back indicating that the prior state for the object is fail-back.
FIG. 7 is a block diagram of the modules resident on each server 56 which collectively accomplish the process 152A associated with detection, fail-over and fail-back. Similar modules exist on each server. A server input unit 304 and display 62
are shown. Modules 306-316 are currently provided with network utilities such as NetWare.RTM. 4.x. These modules may interact with modules 320-328 in order to provide the resident process 152A for detection, fail-over and fail-back. Module 306 may be a NetWare Loadable Module (NLM) which provides a graphical interface by which a user can interact with NetWare.RTM. 4.x and also with the resident process 152A. Module 308 may be a communication module which provides connection oriented service between servers. A connection oriented service is one that utilizes an acknowledgment packet for each package sent. Module 310 may include client base applications which allow a user at a workstation to communicate 330 directly with both the network software, as well as with the resident process 152A. Module 150A is a replica of the enhanced network directory database which includes the additional object properties discussed above in FIGS. 3-4. Module 312, identified as Vol-Lib, is a loadable module which provides volume management services including scanning for volumes, mounting volumes and dismounting volumes. Module 314 is a media manager module which allows a server to obtain identification numbers for all resources which are directly attached to the server. Module 316 is a peripheral attachment module which allows the server to communicate with devices such as storage devices or printers which may be direct attached to it. Module 320 provides an application programming interface (API) which allows additional properties to be added to each object in the enhanced network directory database. This module also allows the property values for those additional properties to be viewed, altered, or updated.
Modules 322-328 may interact with the above discussed modules to provide the server resident processes for detection, fail-over and fail-back. Module 322 may handle communications with a user through network user terminal module 306. Module 322
may also be responsible for sending and receiving packets through NCP module 308 to manage failure detection and recovery detection of a primary server. Module 324, the directory services manager, may be responsible for communicating through module 320
with the enhanced network directory database 150A. Module 324 controls the addition of properties as well as the viewing, and editing of property values within that database. Module 326 is a device driver which in a current embodiment superimposes a phase shifted signal on the peripheral communications between a server and its direct connected resources to detect server failure. Module 326 sends and receives these phase shifted signals through module 316. Module 328 controls the overall interaction of modules 322-326. In addition, module 328 interfaces with module 312 to scan, mount and dismount objects/resources. Furthermore, module 328 interacts with module 314 to obtain device hardware identifiers for those devices which are direct attached to the server. The interaction of each of these modules to provide for detection, fail-over and fail-back will be discussed in detail in the following FIGS. 8A-C.
FIGS. 8A-C show an embodiment of the processes for detection, fail-over and fail-back which are resident on all servers. FIG. 8A identifies the initial processes on both primary and backup servers for creating and authenticating objects. The authentication of an object/resource involves determining its cluster capability, and its primary and backup server affiliation. FIG. 8B indicates processes corresponding to the failure detection and fail-over portion of the processes. FIG. 8C corresponds to those processes associated with recovery and fail-back. FIGS. 8A-C each have a left-hand and a right-hand branch identified as primary and backup branch. Since each server performs as a primary with respect to one object and a secondary with respect to another object, it is a characteristic of the resident processes that they will run alternately in a primary and a backup mode depending on the particular object being processed. For example, when an object being processed has a primary relationship with respect to the server running the processes, then the processes in FIGS. 8A-C identified as primary will be conducted. Alternately, when an object being processed has a secondary relationship with respect to the server running the processes, then the processes in FIGS. 8A-C identified as backup will be conducted. An object which has neither a primary nor backup relationship with the server running the process will not be subject to detection, fail-over or fail-back processing.
FIG. 8A sets forth an embodiment of the authentication process. During authentication, specific primary and secondary server relationships are established for a specific network object. And those relationships are written into the property values associated with that object in the enhanced network directory database. The process begins at process 350 in which a new object is created and property values for context, resource name and server affiliation are defined for that object. Control is then passed to process 352 in which the additional properties discussed above in connection with FIG. 4 are added to that object's definition. Then in process 354 default values for a portion of the new expanded properties are added.
There are several means by which default values can be obtained. In an embodiment default values are completely defined by a network administrator for all expanded properties at the time of object creation. In another embodiment, the one shown in FIG. 8A, only minimal default values are initially defined and the server resident processes on the primary and secondary server define the rest. In either event, the newly created object is added to the expanded network directory and replicated throughout the network.
In FIG. 8A only the capable, enabled, auto-recover and priority properties are defined. Capable, enabled and auto-recover are all set to boolean False and priority is set to a value of "2". The partial definition of property values is feasible only because the hardware environment corresponds to that shown in FIGS. 5A-E and allows for an auto discovery process by which the property values can be automatically filled in by the primary and secondary server resident processes. These processes will now be explained in greater detail. Control then passes to process 356. In process 356, the data manager module acting through the vol-lib module [see FIG. 7] obtains the newly created object. Control is then passed to process 358. In process
358, the data manager module acting through the media module [see FIG. 7] determines the hardware IDs for all the devices direct connected to the server. Control is then passed to process 360. In process 360 the values for primary server property 210
and the hardware IDs 224 [see FIG. 4] in the expanded fields for the newly created object are filled in. The value for the primary server property 210 is equal to the ID of the server running the process. The value for the hardware ID property 224 is set equal to those hardware IDs obtained in process 358 which correspond to the hardware IDs of the object. Control is then passed to decision process 362. In process 362 a determination is made as to whether the user desires to create a new object. In the event that determination is in the affirmative control returns to process 350.
Each server also runs backup authentication processes which are shown on the right side of FIG. 8A. These processes commence at process 370. In process 370 the data manager through the media manager [see FIG. 7] scans locally for all devices which are directed attached to the server. This local scan produces hardware IDs for those objects to which the server is direct attached. Control is then passed to process 372. In process 372 the data manager through Vol Libs [see FIG. 7] obtains globally via the enhanced NDS database a list of all objects with hardware IDs which match those retrieved in process 370. Those objects which have hardware IDs matching those produced in the local scan conducted in process 370 are passed to decision process 376. In decision process 376 a determination is made as to which among those objects have a host server property field with a server ID corresponding to the ID of the server running these backup processes. In the event that determination is in the affirmative control is passed to process 374 in which the next object in the batch passed from process 372 is selected. Control then returns to decision process 376. In decision process 376 objects in which the host server property ID does not match the ID of the server running the process are passed to process 378. In process 378 the server running the process has identified itself as one which can serve as a backup for the object being processed. Thus, the ID of the server running the process is placed in the backup server field 212B [see FIG. 4]. Additionally the objects cluster capable and cluster enabled fields 214B-216B are set to a boolean True condition. The autorecover field 218B is set to boolean True as well. The priority, state and previous state fields 222B and 220B are also filled in with default values. Control is then passed to decision process 380. In decision process 380 a determination is made as to whether the user with administrative privileges wishes to change the default values. In the event that determination is in the negative controller returns to process 370. Alternately, if a determination is in the affirmative is reached then control is passed to process 382. In process 382 an administrator may disable the cluster enabled, autorecover and priority fields 216B, 218B and 222B. Control then returns also to process 370.
FIG. 8B shows the failure detection and fail-back portions of both the primary and backup processes. The processes for a server performing as a primary with respect to an object commence with splice block A. From splice block A control passes to process 398. In process 398 a drive pulse protocol is initiated. The drive pulse protocol is appropriate for those objects which are connected to the server by a bus, a Small Computer Storage Interconnect (SCSI) bus with multiple initiators, or any other means of connection. For example, in FIGS. 5A-5E, connection 250 connects both servers 54 and 56 to storage device 80. The drive pulse protocol across connection 250 enables the secondary server to sense primary server failure, as will be discussed shortly in connection with processes 402-408. The drive pulse protocol works by the current host, by some prearranged schedule, continuously issuing SCSI "release" and "reserve" commands, to allow the backup to detect the failure of the primary. The backup detects these commands being issued by the primary by continuously sendin