United States Patent6654807
Farber , ; et al.November 25, 2003

Title

Internet content delivery network

Abstract

Resource requests made by clients of origin servers in a network are intercepted by reflector mechanisms and selectively reflected to other servers called repeaters. The reflectors select a best repeater from a set of possible repeaters and redirect the client to the selected best repeater. The client then makes the request of the selected best repeater. The resource is possibly rewritten to replace at least some of the resource identifiers contained therein with modified resource identifiers designating the repeater instead of the origin server.


Inventors:Farber; David A. (Oak View, CA), Greer; Richard E.  (Red Lodge, MT), Swart; Andrew D.  (Westlake Village, CA), Balter; James A.  (Santa Barbara, CA)
Assignee:Cable & Wireless Internet Services, Inc. (San Francisco, CA)
Appl. No.:004430
Filed:December 6, 2001

Current U.S. Class:709/225 709/226 709/229 
Field of Search:709/225,226,229

U.S. Patent Documents
4495570January 1985Kitajima et al.
4594704June 1986Ollivier
4726017February 1988Krum et al.
4839798June 1989Eguchi et al.
4920432April 1990Eggers
4922417May 1990Churm et al.
4949187August 1990Cohen
4949248August 1990Caro
5172413December 1992Bradley
5253341October 1993Rozmanith
5287499February 1994Nemes
5287537February 1994Newmark et al.
5291554March 1994Morales
5341477August 1994Pitkin
5371532December 1994Gelman
5410343April 1995Coddington
5414455May 1995Hooper
5442389August 1995Blahut
5442390August 1995Hooper
5442749August 1995Northcutt
5471622November 1995Eadline
5475615December 1995Lin
5508732April 1996Bottomley
5515511May 1996Nguyen
5519435May 1996Anderson
5528281June 1996Grady
5539621July 1996Kikinis
5542087July 1996Neimat et al.
5544313August 1996Shachnai
5544327August 1996Dan
5550577August 1996Verbiest
5550863August 1996Yurt
5550982August 1996Long
5557317September 1996Nishio
5572643November 1996Judson
5590288December 1996Castor
5592611January 1997Midgely
5594910January 1997Filepp et al.
5603026February 1997Demers et al.
5619648April 1997Canale
5623656April 1997Lyons
5625781April 1997Cline
5627829May 1997Gleeson et al.
5630067May 1997Kindell
5633999May 1997Clowes
5634006May 1997Baugher et al.
5638443June 1997Stefik et al.
5644714July 1997Kikinis
5646676July 1997Dewkett et al.
5649186July 1997Ferguson
5659729August 1997Nielsen
5666362September 1997Chen
5671279September 1997Elgamal
5682512October 1997Tetrick
5699513December 1997Feigen et al.
5712979January 1998Graber et al.
5715453February 1998Stewart
5721914February 1998DeVries
5734831March 1998Sanders
5740423April 1998Logan et al.
5742762April 1998Scholl
5751961May 1998Smyk
5761507June 1998Govett
5761663June 1998Lagarde
5764906June 1998Edelstein et al.
5774660June 1998Brendel
5774668June 1998Choquier et al.
5777989July 1998McGarvey
5784058July 1998LaStrange et al.
5796952August 1998Davis
5799141August 1998Galipeau et al.
5802291September 1998Balick et al.
5812769September 1998Graber et al.
5828847October 1998Gehr
5832506November 1998Kuzma
5832514November 1998Norin et al.
5835718November 1998Blewett
5856974January 1999Gervais et al.
5862339January 1999Bonnaure et al.
5867706February 1999Martin
5867799February 1999Lang et al.
5870546February 1999Kirsch
5870559February 1999Leshem et al.
5878212March 1999Civanlar et al.
5884038March 1999Kapoor
5890171March 1999Blumer et al.
5893116April 1999Simmonds et al.
5896533April 1999Ramos et al.
5903723May 1999Beck et al.
5907704May 1999Gudmundson et al.
5913028June 1999Wang et al.
5913033June 1999Grout
5918010June 1999Appleman et al.
5919247July 1999Van Hoff et al.
5920701July 1999Miller et al.
5933832August 1999Suzuoka et al.
5935207August 1999Logue et al.
5945989August 1999Freishtat et al.
5956489September 1999San Andres et al.
5956716September 1999Kenner
5958008September 1999Pogrebisky et al.
5961596October 1999Takubo et al.
5968121October 1999Logan et al.
5983214November 1999Lang et al.
5983227November 1999Nazem
5987606November 1999Cirasole et al.
5991809November 1999Kriegsman
6006264December 1999Colby et al.
6012090January 2000Chung et al.
6014686January 2000Elnozahy et al.
6026440February 2000Shrader et al.
6029175February 2000Chow et al.
6035332March 2000Ingrassia, Jr. et al.
6038610March 2000Belfiore et al.
6041324March 2000Earl et al.
6052718April 2000Gifford
6052730April 2000Felciano et al.
6065051May 2000Steele et al.
6065062May 2000Periasamy et al.
6070191May 2000Narendran et al.
6081829June 2000Sidana
6092112July 2000Fukushige
6092204July 2000Baker
6108673August 2000Brandt et al.
6108703August 2000Leighton et al.
6112231August 2000DeSimone et al.
6112240August 2000Pogue et al.
6128601October 2000Van Horne et al.
6128660October 2000Grimm et al.
6144375November 2000Jain et al.
6173311January 2001Hassett et al.
6178160January 2001Bolton et al.
6185619February 2001Joffe et al.
6189030February 2001Kirsch et al.
6282569August 2001Wallis et al.
6480893November 2002Kriegsman
6484204November 2002Rabinovich
6502215December 2002Raad et al.
Foreign Patent Documents
0 801 487Apr., 1997EP
0 824 236Feb., 1998EP
05162529Jun., 1993JP
0801487Oct., 1997EP
08328583Sep., 1996JP
2 281 793Mar., 1995GB
2202572Oct., 1998CA
2281793Mar., 1995GB
865180Sep., 1998EP
96/42041Dec., 1996WO
97/11429Mar., 1997WO
97/29423Aug., 1997WO
9804985Feb., 1998WO
WO 96/42041Dec., 1996WO
WO 97/11429Mar., 1997WO
WO9729423Aug., 1997WO
Other References
Andresen et al., "SWEB: Towards a Scalable World Wide Web Server on Multicomputers", Proceedings of IPPS, Apr. 15, 1996, pp. 850-856. .
Mourad et al., "Scalable Web Server Architectures", Proceedings IEEE Symposium on Computers and Communications, Jul. 1, 1997, pp. 12-16. .
Adler, "Distributed Coordination Models for Client/Server Computing", Computer, vol. 28, No. 4, Apr. 1, 1995, pp. 14-22. .
International Search Report dated Jun. 8, 1999 for PCT/US99/01477. .
"Local Area Network Server Replacement Procedure," IBM Technical Disclosure Bulletin, vol. 348, No. 01, Jan. 1995, pp. 235-236, ISSN: 0018-8696. .
Gwertzman, J. S., et al., The Case for Geographical push-caching, Workshop on Hot Topics in Operating Systems, May 4, 1995, pp. 51-55. .
De Bra, P. M. E. et al, "Information Retrieval in the World-Wide Web: Making client-based searching feasible," Computer Networks and ISDN System, NL, North Holland Publishing, Amsterdam, vol. 27, No. 2, Nov. 1, 1994, pp. 183-192, ISSN: 0169-7552. .
Communication (1 page) and European Search Report for Appln. No. EP 00 12 8346 (3 pages) mailed Aug. 24, 2001. .
Bestavros, Azer,"Speculative Data Dissermination and Service to Reduce Server Load, Network Traffic and Service Time in Distributed Information Systems." In Proceedings of ICDE '96: The 1996 International Conference on Data Engineering, Mar. 1996, 4 pgs. .
Carter, J. Lawrence, et al. "Universal Classes of Hash Functions." Journal of Computer and System Sciences, vol. 18, No. 2, Apr. 1979, pp. 143-154. .
Chankhunthod, Anawat, et al. "A Hierarchical Internet Object Cache." In USENIX Proceedings, Jan. 1996, pp. 153-163. .
Cormen, Thomas H., et al. Introduction to Algorithms, The MIT Press, Cambridge, Massachusetts, 1994, pp. 219-243, 991-993. .
Deering, Stephen, et al. "Multicast Routing in Datagram Internetworks and Extended LANs." ACM Transactions on Computer Systems, vol. 8, No. 2, May 1990, pp. 85-110. .
Devine, Robert, "Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm." In Proceedings of 4th International Conference on Foundations of Data Organizations and Algorithms, 1993, pp. 101-114. .
Grigni, Michelangelo, et al. "Tight Bounds on Minimum Broadcasts Networks." SIAM Journal of Discrete Mathematics, vol. 4, No. 2, May 1991, pp. 207-222. .
Gwertzman, James, et al. "The Case for Geographical Push-Caching." Technical Report HU TR 34-94 (excerpt), Harvard University, DAS, Cambridge, MA 02138, 1994, 2 pgs. .
Gwertzman, James et al. "World-Wide Web Cache Consistency." In Proceedings of the 1996 USENIX Technical Conference, Jan. 1996, 8 pgs. .
Feeley, Michael, et al. "Implementing Global Memory Management in a Workstation Cluster." In Proceedings of the 15th ACM Symposium on Operating Systems Principles, 1995, pp. 201-212. .
Floyd, Sally, et al. "A Reliable Multicast Framework for Light-Weight Sessions and Application Level Framing." In Proceeding of ACM SIGCOMM '95, pp. 342-356. .
Fredman, Michael, et al. "Storing a Sparse Table with 0(1) Worst Case Access Time." Journal of the Association for Computing Machinery, vol. 31, No. 3, Jul. 1984, pp. 538-544. .
Karger, David, et al. "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web." In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, May 1997, pp. 654-663. .
Litwin, Wlthold, et al. "LH-A Scaleable, Distributed Data Structure." ACM Transactions on Database Systems, vol. 21, No. 4, Dec. 1996, pp. 480-525. .
Malpani, Radhika, et al. "Making World Wide Web Caching Servers Cooperate." In Proceedings of World Wide Web Conference, 1996, 6 pgs. .
Naor, Moni, et al. "The Load, Capacity and Availability of Quorum Systems." In Proceedings of the 35th IEEE Symposium on Foundations of Computer Science, Nov. 1994, pp. 214-225. .
Nisan, Noam. "Psuedorandom Generators for Space-Bounded Computation." In Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, May 1990, pp. 204-212. .
Palmer, Mark, et al. "Fido: A Cache that Learns to Fetch." In Proceedings of the 17th International Conference on Very Large Data Bases, Sep. 1991, pp. 255-264. .
Panigrahy, Rina. Relieving Hot Spots on the World Wide Web. Massachusetts Institute of Technology, Jun. 1997, pp. 1-66. .
Peleg, David, et al. "The Availabiltiy of Quorum Systems." Information and Computation 123, 1995, 210-223. .
Plaxton, C. Greg, et al. "Fast Fault-Tolerant Concurrent Access to Shared Objects." In Proceedings of 37th IEEE Symposium of Foundations of Computer Science, 1996, pp. 570-579. .
Rabin, Michael. "Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance." Journal of the ACM, vol. 36, No. 2, Apr. 1989, pp. 335-348. .
Ravi, R., "Rapid Rumor Ramification: Approximating the Minimum Broadcast Time." In Proceedings of the 35th IEEE Symposium on Foundation of Computer Science, Nov. 1994, pp. 202-212. .
Schmidt, Jeanette, et al. "Chernoff-Hoeffding Bounds for Applications with Limited Independence." In Proceedings of the 4th ACS-SIAM Symposium on Discrete Algorithms, 1993, pp. 331-340. .
Tarjan, Robert Endre, et al. "Storing a Sparse Table." Communications of the ACM, vol. 22, No. 11, Nov. 1979, pp. 606-611. .
Vitter, Jeffrey Scott, et al. "Optimal Prefetching via Data Compression." In Proceedings of 32nd IEEE Symposium on Foundations of Computer Science, Nov. 1991, pp. 121-130. .
Wegman, Mark, et al. "New Hash Functions and Their Use in Authentication and Set Equality." Journal of Computer and System Sciences vol. 22, Jun. 1981, pp. 265-279. .
Yao, Andrew Chi-Chih. "Should Tables be Sorted?" Journal of the Association for Computing Machinery, vol. 28, No. 3, Jul. 1981, pp. 615-628. .
Patent Abstracts of Japan, Method and Device for Repeating and Converting Information, Sep. 1996, JP 08328583. .
Cisco Distributed Director, CISCO Systems Inc., pp. 1-15, http://www.cisco.com/warp/public/751/distdir/dd_wp.htm, 1997. .
"Cisco Local Director Product Overview." CISCO Systems Inc., 1997, pp. 1-4, http://www.cisco.com/warp/public/751/lodir/lodir_ds.htm. .
"Local Director". Cisco Systems Inc., pp. 1-3. .
"Excite Chooses Cisco LocalDirector to Support its Growing Service." Cisco Systems Inc., 1997, pp. 1-2, http://www.cisco.com/warp/public/146/1857.html. .
"Cisco Systems Ships Cisco LocalDirector to Meet Demands of High-Volume Web Traffic." CISCO Systems Inc., 1996, pp. 1-2, http://www.cisco.com/warp/public/146/323.html. .
"Scaling the Internet News Service." Cisco Systems Inc., 1997, pp. 1-11, http://www.cisco.com/warp/public/751/lodir/news_wp.htm. .
"Scaling the World Wide Web." SISCO Systems Inc., 1996, pp. 1-7, http://www.cisco.com/wapr/public/751/lodir/swww_wp.html. .
"How to Cost-Effectivley Scale Web Servers." CISCO Systems Inc., 1996, pp. 1-5, http://www.cisco.com/warp/public/784/5.html. .
"Cisco Cache Engine." CISCO Systems Inc., 1 pg, http://www.cisco.com/warp/public/751/cache/index.shtml. .
"Cisco Introduces Web Cache Product for Scaling the Internet." CISCO Systems Inc., 1997, 1 pg, http://www.cisco.com/warp/public/146/1947.html. .
"Cache Director System," CISCO Systems Inc., 1997, pp. 1-5, http://www.cisco.com/warp/public/751/cache/cds_ov.htm. .
"Local Director." 2 pgs, http://www.cisco.com/warp/public/751/lodir/index.html. .
"Cisco Takes Global Route." PC Week News, Feb. 17, 1997, p. 23. .
"Global IP/IPX Service Should Keep Network Delays Down." Infoworld, Jan. 20, 1997, 1 pg. .
Patent Abstracts of Japan, "Electronic Mail Multiplexing System and Communication Control Method in The System." Jun. 30, 1993, JP 05162529. .
"Cache Director Systems Technology Overview." CISCO System Inc., 1997, pp. 1-3, http://www.cisco.com/warp/public/751/cache/cds_ds.htm. .
"Local Area Network Server Replacement Procedure", IBM Technical Disclosure Bulletin, vol. 38, No. 01, Jan. 1995, pp. 235-236. .
Gwertzman, James, et al. "The Case for Geographical Push-Caching", May 4, 1995, pp. 51-55. .
De Bra, P.M.E., et al. "Information retrieval in the World-Wide Web: Making client-based searching feasible", Computer Networks and ISDN Systems 27, (1994), pp. 183-192..~
Primary Examiner: Geckil; Mehmet B.
Attorney, Agent or Firm:Blakely Sokoloff Taylor & Zafman LLP

Parent Case Text



This is a Continuation of National application Ser. No. 09/612,598 filed Jul. 7, 2000, which is a Continuation of National application Ser. No. 09/021,506 filed Feb. 10, 1998 now U.S. Pat. No. 6,185,598.

Claims


What is claimed:
1. A method, in a system which includes (a) a repeater server network including a plurality of repeater servers, (b) a plurality of subscribers to the repeater server network, wherein the plurality of subscribers are entities that publish information via one or more origin servers, the origin servers being distinct from the plurality of repeater servers, and in which at least some of the repeater servers replicate some or all of the information available on the origin servers, (c) a repeater selector mechanism constructed and adapted to identify, for a particular client request, an appropriate repeater server from the plurality of repeater servers, and (d) a subscriber verifying mechanism constructed and adapted to verify whether an entity is any one of the plurality of subscribers to the repeater server network, the method comprising: obtaining a client request for information, wherein the client request is for a resource which is embedded in another document; identifying, by the repeater selector mechanism, a repeater server of the repeater server network to handle the client request; determining, using at least the subscriber verifying mechanism, whether the requested information is from any one of the plurality of subscribers to the repeater server network; and providing, from the repeater server identified by the repeater selector mechanism, the requested information when the client request is determined to be for information from one of the plurality of subscribers to the repeater server network.

2. A method as in claim 1 further comprising: rejecting the client request when it is not for information from any one of the plurality of subscribers.

3. A method as in claim 2, further comprising: when the client request is for information from a subscriber being the one of the plurality of subscribers, determining whether the requested information is cached locally at the repeater server, and, based, at least in part, on said determining, if the requested information is cached locally, retrieving the requested information; and when the requested information is determined not to be cached locally, obtaining the requested information either from an origin server of the subscriber from a peer cache.

4. A method as in claim 3, further comprising: upon obtaining the requested information, caching the requested information locally.

5. A method as in claim 4 further comprising: constructing a reply including the requested information; and sending the reply to a client initiating the client request.

6. A method as in claim 5 further comprising: recording details of a transaction with the repeater server network.

7. A method as in claim 6 wherein the details include one or more of: (a) a current time, (b) an address of a requester, (c) a Uniform Resource Locator (URL) requested, (d) a type of response generated by the repeater server, (e) whether the client request was served by the repeater server from the cache; (f) whether the client request was served by the repeater server alter filling a cache; (g) whether a request pending invalidation was served by the repeater server; (h) a duration indicative of the time required to serve the resource, and (i) bandwidth used to serve the resource.

8. A method, in a system which includes (a) a repeater server network including a plurality of repeater servers, (b) a plurality of subscribers to the repeater server network, the plurality of subscribers being entities that publish information via one or more origin servers, and in which the origin servers are distinct from the plurality of repeater servers, and in which at least some of the plurality of repeater servers replicate some or all of the information available on at least some of the origin servers, (c) a repeater selector mechanism constructed and adapted to identify, for a particular client request, an appropriate repeater server from the plurality of repeater servers, and (d) a subscriber verifying mechanism constructed and adapted to verify whether an entity is any one of the plurality of subscribers to the repeater server network, method comprising: obtaining a client request for information by a repeater server of the plurality of repeater servers forming the repeater server network, the repeater server being identified by the repeater selector mechanism, wherein the client request is for a resource which is embedded in another document; determining, using at least the subscriber verifying mechanism and based, at least in part, on a name by which the repeater server was addressed, whether the requested information is from any one of the plurality of entities that publish information to the repeater server network; and when the client request is determined to be for information from one of the plurality of entities that publish information to the repeater server network, serving the requested information from the repeater sewer as identified by the repeater selector mechanism.

9. A method, in a system which includes (a) a repeater server network including a plurality of repeater servers, (b) a plurality of subscribers to the repeater server network, the plurality of subscribers being entities that publish information via one or more origin servers, and in which the origin servers are distinct from the plurality of repeater servers, and in which at least some of the plurality of repeater servers replicate some or all of the information available on at least some of the origin servers, (c) a repeater selector mechanism constructed and adapted to identify, for a particular client request, an appropriate repeater server from the plurality of repeater servers, and (d) a subscriber verifying mechanism constructed and adapted to verify whether an entity is any one of the plurality of subscribers to the repeater server network, the method comprising: obtaining a client request for information, the client request being for a resource which is embedded in another document which is sewed by an origin server of the one or more origin servers; identifying, by the repeater selector mechanism, a repeater server of the repeater server network to handle the client request; determining, using at least the subscriber verifying mechanism. and based, at least in part, on an origin sewer name in a Uniform Resource Locator (URL) associated with the client request, whether the requested information is from any one of the plurality of entities that publish information to the repeater server network; and when the client request is determined to be for information from one of the plurality of entities that publish information to the repeater server network, serving the requested information from the repeater server as identified by the repeater selector mechanism.

10. A method as in any one of claims 8 and 9, the determining further comprises: using at least information in an Hypertext Transfer Protocol (HTTP) header of the client request to determine a name by which the repeater server was addressed.

11. A method, in a system which includes (a) a repeater server network including a plurality of repeater servers, (b) a plurality of subscribers to the repeater server network, wherein the plurality of subscribers are entities that publish information via at least one origin server, wherein the at least one origin server is distinct from the plurality of repeater servers, and in which at least some of the plurality of repeater servers replicate some or all of the information available on the at least one origin server, (c) a repeater selector mechanism constructed and adapted to identify, for a client request, a repeater server from the plurality of repeater servers to handle the client request, and (d) a subscriber verifying mechanism constructed and adapted to verify whether an entity is any one of the plurality of subscribers to the repeater server network, the method comprising, at one of the plurality of repeater servers: (A) obtaining a client request for information from a client by the repeater server of the plurality of repeater servers forming the repeater server network, the repeater server having been identified by the repeater selector mechanism, the client request being for a resource which is embedded in another document; (B) determining, using at least the subscriber verifying mechanism, whether the requested information is from a subscriber of the plurality of subscribers, said determining being based, at least in part, on at least one of (a) a name by which the repeater server was addressed, and (b) an origin server name in a Uniform Resource Locator (URL) used to make the client request; (C) when the client request is determined to be for information from a subscriber to the repeater server network, then at the repeater server selected by the repeater selector mechanism: (C1) if is determined that the requested information is cached locally at the repeater server, retrieving the requested information from a local cache at the repeater server; and (C2) when the requested information is determined not to be cached locally at the repeater server, (C21) obtaining the requested information from the origin server or from a peer cache; (C22) caching the information locally at the repeater server; and (C3) constructing a reply including the requested information; and (C4) sending the reply to the client; and (D) when the client request is determined not to be for information from a subscriber to the repeater server network, rejecting the client request.

12. A method, in a system which includes (a) a repeater server network including a plurality of repeater servers, (b) a plurality of subscribers to the repeater server network, wherein the plurality of subscribers are entities that publish resources via at least one origin server, and wherein the at least one origin server is distinct from the plurality of repeater servers, (c) a repeater selector mechanism constructed and adapted to identify, for a particular request, a repeater server to handle the request, and (d) a subscriber verifying mechanism constructed and adapted to verify whether an entity is an one of the plurality of subscribers to the repeater server network, the method comprising: on at least some of the repeater sewers in the repeater server network, replicating some or all of the information available on the at least one origin server; upon receipt of a client request for information, the client request being for a resource which is embedded in another document, determining, using at least the subscriber verifying mechanism, whether the client request is for information from a known entity that publishes resources to the repeater server network; and serving the information, from the repeater server identified by the repeater selector mechanism, when the client request is determined to be for information from one of the plurality of subscribers being an entity that publishes resources to the repeater server network.

13. A method as in claim 12 further comprising: rejecting the client request when it is not for information from a subscriber.

14. A method as in claim 13, further comprising: if the client request is for information from a subscriber, if the requested information is cached locally, retrieving the requested information from a cache; otherwise obtaining the requested information from the at least one origin server or from a per cache.

15. A method as in claim 14 further comprising: upon obtaining the requested information, caching the information locally.

16. A method as in claim 15 further comprising: constructing a reply including the requested information, and sending the reply to a client issuing the client request.

17. A method as in claim 16 further comprising: recording details about a transaction between the client and the repeater server network.

18. A method as in claim 17 wherein the details include one or more of: (a) a current time, (b) an address of a requester, (c) a Uniform Resource Locator GIRL) requested, (d) a type of response generated, (e) whether the client request was served by a repeater server from the cache; (1) whether the client request was served by a repeater server after filling a cache; (g) whether a request pending invalidation was served by a repeater server; (h) a duration indicative of the time required to serve the resource, and (i) bandwidth used to serve the resource.

19. A method, in a system which includes (a) a repeater server network including a plurality of repeater servers, (b) a plurality of sub scribers to the repeater server network, wherein the plurality of subscribers are entities that publish resources via one or more origin servers, wherein the origin servers are distinct from the plurality of repeater servers, (c) a repeater selector mechanism constructed and adapted to identify, for a client request, a repeater server from the plurality of repeater servers to handle the client request, and (d) a subscriber verifying mechanism constructed and adapted to verify whether an entity is an one of the plurality of subscribers to the repeater server network, the method comprising: on at least some of the plurality of repeater servers in the repeater sewer network, replicating some or all of the information available on the one or more origin servers; identifying, by the repeater selector mechanism, a repeater server of the repeater server network to handle the client request; upon receipt of the client request for information, the client request being for a resource which is embedded in another document, determining, by the subscriber verifying mechanism, and based, at least in part, on a name by which the repeater server in receipt of the client request was addressed, whether the client request is for information from a known entity that publishes resources to the repeater server network; and serving the information from the repeater server selected by the repeater selector mechanism when the client request is determined to be for information from a known entity that publishes resources to the repeater server network.

20. A method as in claim 19, wherein the repeater server in receipt of the request uses at least information in an Hypertext Transfer Protocol (HTTP) header of the client request to determine the name by which the repeater server was addressed.

21. A method, in a system which includes (a) a repeater server network including a plurality of repeater servers, (b) a plurality of subscribers to the repeater server network, wherein the plurality of subscribers are entities that publish resources via one or more origin servers, wherein the origin servers are distinct from the repeater servers, (c) a repeater selector mechanism constructed and adapted to identify, for a client request, a repeater server from the plurality of repeater servers to handle the request, and (d) a subscriber verifying mechanism constructed and adapted to verify whether an entity is any one of the plurality of subscribers to the repeater server network, the method comprising: on at least some of the repeater servers in the repeater server network, replicating some or all of the information available on the one or more origin servers; upon receipt of a client request for information, the request being for a resource which is embedded in another document, determining, using at least the subscriber verifying mechanism, and based, at least in part, on an origin server name in a Uniform Resource Locator (URL) used to make the client request, whether the client request is for information from a known entity that publishes resources to the repeater server network; and serving the information from the repeater server identified by the repeater selector mechanism when the client request is determined to be for information from a known entity that publishes resources to the repeater server network.

22. In a computer network which includes (a) a plurality of origin servers, (b) a plurality of repeater servers distinct from the plurality of origin servers and forming at least one repeater server network, (c) a repeater selector mechanism for identifying, for a request from a client, and based, at least in part, on a location of the client in the computer network, a repeater server of the plurality of repeater servers to handle the request, and (d) a subscriber verifying mechanism constructed and adapted to verify whether an entity is any one of the plurality of subscribers to the repeater server network, and in which at least one of the plurality of repeater servers replicates some or all of the information available on the plurality of origin servers, and in which requests for resources may be handled by the repeater server network in order for the resources to be served, a method comprising: obtaining a client request from the client for a resource, the resource being embedded in another document if the client request for the resource is not for a resource from a subscriber to the repeater server network, as determined at least in part by the subscriber verifying mechanism rejecting the client request, otherwise constructing a reply including the resource and serving that reply from the repeater server identified by the repeater selector mechanism.

23. A method as in claim 22, wherein the resource is obtained by the repeater server of the plurality of repeater servers from the origin server of the plurality of origin servers, and wherein the repeater server determines the origin server from which to obtain the resources based, at least in part, upon the name by which the repeater server was addressed.

24. A method as in claim 22 wherein the resource is obtained by the repeater server of the plurality of repeater servers from the origin server of the plurality of origin servers, and wherein the repeater server determines the origin server from which to obtain the resources based, at least in part, upon information in an Hypertext Transfer Protocol (HTTP) header.

25. In a computer network which includes (a) a plurality of origin servers, and (b) at least one repeater server network including a plurality of repeater servers distinct from the plurality of origin servers, (c) a repeater selector mechanism for identifying an appropriate repeater server to handle a request, and in which at least one of the plurality of repeater servers replicates a portion of the information available on at least one of the plurality of origin servers, and (d) a subscriber verifying mechanism constructed and adapted to verify whether an entity is any one of a plurality of subscribers to the repeater server network, and in which requests may be handled by the repeater server network in order for resources to be served, a method comprising: obtaining a request from a client for a resource, the resource being embedded in another document; identifying, by the repeater selector mechanism, a repeater server of the repeater server network to handle the request; if it is determined, at least in part by the subscriber verifying mechanism that the request is for information from a known subscriber, determining whether the resource is cached locally at the repeater server selected by the repeater selector mechanism, and, based, at least in part, on said determining, if the requested resource is in a cache of the repeater server, retrieving the resource; and when the resource is determined not to be cached locally at the repeater server, attempting to obtain the resource from the origin server of the plurality of origin servers or a peer cache, wherein the repeater server determines which origin server of the plurality of origin servers to use based, at least in part, upon the name by which the repeater server was addressed.

26. A server including a processor adapted and programmed to: (i) replicate at least a portion of information available on a plurality of origin servers distinct from the server; (ii) obtain a client request for information, the client request being for a resource which is embedded in another document; (iii) determine, based, at least in part, on a name by which the server was addressed, and using a subscriber verifying mechanism constructed and adapted to verify that the name belongs to a subscriber to a repeater server network which includes the server, whether the client request is for information from a known subscriber to the repeater server network which includes the server; and (iv) serve the requested information to the client when the request is determined to be for information from the known subscriber to the repeater server network.

27. A server including a processor adapted and programmed to: (i) replicate at least a portion of information available on a plurality of origin servers distinct from the server; (ii) obtain a client request for information, the client request being for a resource which is embedded in another document served from an origin server of the plurality of origin servers; (iii) determine, based, at least in part, on at least an origin server name in a Uniform Resource Locator (URL) used to make the client request, and using subscriber verifying mechanism constructed and adapted to verify that the origin server name belongs to a subscriber to a repeater server network which includes the server, whether the client request is for information from a known subscriber to a repeater server network which includes the server; and (iv) serve the information to the client when the client request is determined to be for information from the known subscriber to the repeater server network.

28. A server as in any one of claims 26 and 27, wherein the server uses at least information in an Hypertext Transfer Protocol (HTTP) header to determine the name by which the server was addressed.

29. A server as in claim 26 further programmed to: reject the client request if the client request is not for information from a known subscriber.

30. A server as in claim 26, further programmed to: if it is determined tat the client request is for information from the known subscriber, determine whether the information is cached locally at the server, and, based, at least in part, on said determining, if the information is cached locally, retrieve the information; and when the information is determined not to be cached locally, attempt to obtain the information from the origin server or from a peer cache.

31. A server as in claim 30 further programmed to: upon obtaining the information, cache the information locally.

32. A server as in claim 31 further programmed to: construct and send a reply including the information.

33. A server as in claim 32 further programmed to: record details about a transaction involving the client request.

34. A server as in claim 33 wherein the details include one or more of: the current time, the address of the client issuing the client request, the Uniform Resource Locator (URL) requested, and a type of response generated by the server.

35. A method as in claim 1 wherein, if the repeater server fails, another repeater server of the plurality of repeater servers will take over the role of the failed repeater server.

36. A method as in any one of claims 1-9, 11-21 wherein the repeater selector mechanism identifies the repeater server based, at least in part, on the load on at least some of the repeater servers.

37. A method as in any one of claims 1-9, 1-21 wherein the repeater selector mechanism identifies the repeater server based, at least in part, on a location on the network of a client sending the client request.

38. A method as in claim 1, wherein the repeater selector mechanism includes a network map for use in directing the client request for information.

39. A method as in claim 1 wherein the repeater selector mechanism is co-located wit one of the one or more origin servers.

40. A method as in any one of claims 1-9, 11-21 wherein the repeater selector mechanism identifies the repeater server based, at least in part, on at least (a) a load on at least some of the plurality of repeater servers forming the repeater server network, and (b) a location of a client sending The client request.

41. A method as in any one of claims 1-9, 11-21 wherein the repeater selector mechanism identifies the repeater server based, at least in part, on a relative cost of transmitting requested information between the repeater server and a client sending the client request.

42. A method as in any of claims 1-9, 11-21, wherein the other document is an HTML (Hyper Text Markup Language) document or an XML document which was served from the origin server.

43. A method as in any of claims 1-9, 11-21, wherein the document in which the client request was embedded was served from the origin server.

44. A method as in claim 42 wherein the requested resource is a video stream.

45. A method as in any of claims 1-9, 11-21, 22-25, wherein the subscriber verifying mechanism is located at a repeater server.

46. A method as in any claims 1-9, 11-21, 22-25, wherein the subscriber verifying mechanism comprises a table used to verify that an origin server belongs to a known subscriber.

47. A server as in any of claims 26 and 27, wherein the subscriber verifying mechanism is located at a repeater server.

48. A method as in any of claims 26 and 27 wherein the subscriber verifying mechanism comprises a table used to verify that an origin server belongs to a known subscriber.

Description

FIELD OF THE INVENTION

This invention relates to replication of resources in computer networks.

BACKGROUND OF THE INVENTION

The advent of global computer networks, such as the Internet, have led to entirely new and different ways to obtain information. A user of the Internet can now access information from anywhere in the world, with no regard for the actual location of either the user or the information. A user can obtain information simply by knowing a network address for the information and providing that address to an appropriate application program such as a network browser.

The rapid growth in popularity of the Internet has imposed a heavy traffic burden on the entire network. Solutions to problems of demand (e.g., better accessibility and faster communication links) only increase the strain on the supply. Internet Web sites (referred to here as "publishers") must handle ever-increasing bandwidth needs, accommodate dynamic changes in load, and improve performance for distant browsing clients, especially those overseas. The adoption of content-rich applications, such as live audio and video, has further exacerbated the problem.

To address basic bandwidth growth needs, a Web publisher typically subscribes to additional bandwidth from an Internet service provider (ISP), whether in the form of larger or additional "pipes" or channels from the ISP to the publisher's premises, or in the form of large bandwidth commitments in an ISP's remote hosting server collection. These increments are not always as fine-grained as the publisher needs, and quite often lead times can cause the publisher's Web site capacity to lag behind demand.

To address more serious bandwidth growth problems, publishers may develop more complex and costly custom solutions. The solution to the most common need, increasing capacity, is generally based on replication of hardware resources and site content (known as mirroring), and duplication of bandwidth resources. These solutions, however, are difficult and expensive to deploy and operate. As a result, only the largest publishers can afford them, since only those publishers can amortize the costs over many customers (and Web site hits).

A number of solutions have been developed to advance replication and mirroring. In general, these technologies are designed for use by a single Web site and do not include features that allow their components to be shared by many Web sites simultaneously.

Some solution mechanisms offer replication software that helps keep mirrored servers up-to-date. These mechanisms generally operate by making a complete copy of a file system. One such system operates by transparently keeping multiple copies of a file system in synch. Another system provides mechanisms for explicitly and regularly copying files that have changed. Database systems are particularly difficult to replicate, as they are continually changing. Several mechanisms allow for replication of databases, although there are no standard approaches for accomplishing it. Several companies offering proxy caches describe them as replication tools. However, proxy caches differ because they are operated on behalf of clients rather than publishers.

Once a Web site is served by multiple servers, a challenge is to ensure that the load is appropriately distributed or balanced among those servers. Domain name-server-based round-robin address resolution causes different clients to be directed to different mirrors.

Another solution, load balancing, takes into account the load at each server (measured in a variety of ways) to select which server should handle a particular request.

Load balancers use a variety of techniques to route the request to the appropriate server. Most of those load-balancing techniques require that each server be an exact replica of the primary Web site. Load balancers do not take into account the "network distance" between the client and candidate mirror servers.

Assuming that client protocols cannot easily change, there are two major problems in the deployment of replicated resources. The first is how to select which copy of the resource to use. That is, when a request for a resource is made to a single server, how should the choice of a replica of the server (or of that data) be made. We call this problem the "rendezvous problem". There are a number of ways to get clients to rendezvous at distant mirror servers. These technologies, like load balancers, must route a request to an appropriate server, but unlike load balancers, they take network performance and topology into account in making the determination.

A number of companies offer products which improve network performance by prioritizing and filtering network traffic. Proxy caches provide a way for client aggregators to reduce network resource consumption by storing copies of popular resources close to the end users. A client aggregator is an Internet service provider or other organization that brings a large number of clients operating browsers to the Internet. Client aggregators may use proxy caches to reduce the bandwidth required to serve web content to these browsers. However, traditional proxy caches are operated on behalf of Web clients rather than Web publishers.

Proxy caches store the most popular resources from all publishers, which means they must be very large to achieve reasonable cache efficiency. (The efficiency of a cache is defined as the number of requests for resources which are already cached divided by the total number of requests.)

Proxy caches depend on cache control hints delivered with resources to determine when the resources should be replaced. These hints are predictive, and are necessarily often incorrect, so proxy caches frequently serve stale data. In many cases, proxy cache operators instruct their proxy to ignore hints in order to make the cache more efficient, even though this causes it to more frequently serve stale data.

Proxy caches hide the activity of clients from publishers. Once a resource is cached, the publisher has no way of knowing how often it was accessed from the cache.

SUMMARY OF THE INVENTION

This invention provides a way for servers in a computer network to off-load their processing of requests for selected resources by determining a different server (a "repeater") to process those requests. The selection of the repeater can be made dynamically, based on information about possible repeaters.

If a requested resource contains references to other resources, some or all of these references can be replaced by references to repeaters.

Accordingly, in one aspect, this invention is a method of processing resource requests in a computer network. First a client makes a request for a particular resource from an origin server, the request including a resource identifier for the particular resource, the resource identifier sometimes including an indication of the origin server. Requests arriving at the origin server do not always include an indication of the origin server; since they are sent to the origin server, they do not need to name it. A mechanism referred to as a reflector, co-located with the origin server, intercepts the request from the client to the origin server and decides whether to reflect the request or to handle it locally. If the reflector decides to handle the request locally, it forwards it to the origin server, otherwise it selects a "best" repeater to process the request. If the request is reflected, the client is provided with a modified resource identifier designating the repeater.

The client gets the modified resource identifier from the reflector and makes a request for the particular resource from the repeater designated in the modified resource identifier.

When the repeater gets the client's request, it responds by returning the requested resource to the client. If the repeater has a local copy of the resource then it returns that copy, otherwise it forwards the request to the origin server to get the resource, and saves a local copy of the resource in order to serve subsequent requests.

The selection by the reflector of an appropriate repeater to handle the request can be done in a number of ways. In the preferred embodiment, it is done by first pre-partitioning the network into "cost groups" and then determining which cost group the client is in. Next, from a plurality of repeaters in the network, a set of repeaters is selected, the members of the set having a low cost relative to the cost group which the client is in. In order to determine the lowest cost, a table is maintained and regularly updated to define the cost between each group and each repeater. Then one member of the set is selected, preferably randomly, as the best repeater.

If the particular requested resource itself can contain identifiers of other resources, then the resource may be rewritten (before being provided to the client). In particular, the resource is rewritten to replace at least some of the resource identifiers contained therein with modified resource identifiers designating a repeater instead of the origin server. As a consequence of this rewriting process, when the client requests other resources based on identifiers in the particular requested resource, the client will make those requests directly to the selected repeater, bypassing the reflector and origin server entirely.

Resource rewriting must be performed by reflectors. It may also be performed by repeaters, in the situation where repeaters "peer" with one another and make copies of resources which include rewritten resource identifiers that designate a repeater.

In a preferred embodiment, the network is the Internet and the resource identifier is a uniform resource locator (URL) for designating resources on the Internet, and the modified resource identifier is a URL designating the repeater and indicating the origin server (as described in step B3 below), and the modified resource identifier is provided to the client using a REDIRECT message. Note, only when the reflector is "reflecting" a request is the modified resource identifier provided using a REDIRECT message.

In another aspect, this invention is a computer network comprising a plurality of origin servers, at least some of the origin servers having reflectors associated therewith, and a plurality of repeaters.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects and advantages of the invention will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which the reference characters refer to like parts throughout and in which:

FIG. 1 depicts a portion of a network environment according to the present invention; and

FIGS. 2-6 are flow charts of the operation of the present invention.

DETAILED DESCRIPTION OF THE PRESENTLY PREFERRED EXEMPLARY EMBODIMENTS

Overview

FIG. 1 shows a portion of a network environment 100 according to the present invention, wherein a mechanism (reflector 108, described in detail below) at a server (herein origin server 102) maintains and keeps track of a number of partially replicated servers or repeaters 104a, 104b, and 104c. Each repeater 104a, 104b, and 104c replicates some or all of the information available on the origin server 102 as well as information available on other origin servers in the network 100. Reflector
108 is connected to a particular repeater known as its "contact" repeater ("Repeater B" 104b in the system depicted in FIG. 1). Preferably each reflector maintains a connection with a single repeater known as its contact, and each repeater maintains a connection with a special repeater known as its master repeater (e.g., repeater 104m for repeaters 104a, 104b and 104c in FIG. 1).

Thus, a repeater can be considered as a dedicated proxy server that maintains a partial or sparse mirror of the origin server 102, by implementing a distributed coherent cache of the origin server. A repeater may maintain a (partial) mirror of more than one origin server. In some embodiments, the network 100 is the Internet and repeaters mirror selected resources provided by origin servers in response to clients' HTTP (hypertext transfer protocol) and FTP (file transfer protocol) requests.

A client 106 connects, via the network 100, to origin server 102 and possibly to one or more repeaters 104a etc.

Origin server 102 is a server at which resources originate. More generally, the origin server 102 is any process or collection of processes that provide resources in response to requests from a client 106. Origin server 102 can be any off-the-shelf Web server. In a preferred embodiment, origin server 102 is typically a Web server such as the Apache server or Netscape Communications Corporation's Enterprise.TM. server.

Client 106 is a processor requesting resources from origin server 102 on behalf of an end user. The client 106 is typically a user agent (e.g., a Web browser such as Netscape Communications Corporation's Navigator.TM.) or a proxy for a user agent. Components other than the reflector 108 and the repeaters 104a, 104b, etc., may be implemented using commonly available software programs. In particular, this invention works with any HTTP client (e.g., a Web browser), proxy cache, and Web server. In addition, the reflector 108 might be fully integrated into the data server 112 (for instance, in a Web Server). These components might be loosely integrated based on the use of extension mechanisms (such as so-called add-in modules) or tightly integrated by modifying the service component specifically to support the repeaters.

Resources originating at the origin server 102 may be static or dynamic. That is, the resources may be fixed or they may be created by the origin server 102 specifically in response to a request. Note that the terms "static" and "dynamic" are relative, since a static resource may change at some regular, albeit long, interval.

Resource requests from the client 106 to the origin server 102 are intercepted by reflector 108 which for a given request either forwards the request on to the origin server 102 or conditionally reflects it to some repeater 104a, 104b, etc. in the network 100. That is, depending on the nature of the request by the client 106 to the origin server 102, the reflector 108 either serves the request locally (at the origin server 102), or selects one of the repeaters (preferably the best repeater for the job) and reflects the request to the selected repeater. In other words, the reflector 108 causes requests for resources from origin server 102, made by client 106, to be either served locally by the origin server 102 or transparently reflected to the best repeater 104a, 104b, etc. The notion of a best repeater and the manner in which the best repeater is selected are described in detail below.

Repeaters 104a, 104b, etc. are intermediate processors used to service client requests thereby improving performance and reducing costs in the manner described herein. Within repeaters 104a, 104b, etc., are any processes or collections of processes that deliver resources to the client 106 on behalf of the origin server 102. A repeater may include a repeater cache 110, used to avoid unnecessary transactions with the origin server 102.

The reflector 108 is a mechanism, preferably a software program, that intercepts requests that would normally be sent directly to the origin server 102. While shown in the drawings as separate components, the reflector 108 and the origin server
102 are typically co-located, e.g., on a particular system such as data server 112. (As discussed below, the reflector 108 may even be a "plug in" module that becomes part of the origin server 102.

FIG. 1 shows only a part of a network 100 according to this invention. A complete operating network consists of any number of clients, repeaters, reflectors, and origin servers. Reflectors communicate with the repeater network, and repeaters in the network communicate with one another.

Uniform Resource Locators

Each location in a computer network has an address which can generally be specified as a series of names or numbers. In order to access information, an address for that information must be known. For example, on the World Wide Web ("the Web") which is a subset of the Internet, the manner in which information address locations are provided has been standardized into Uniform Resource Locators (URLs). URLs specify the location of resources (information, data files, etc.) on the network.

The notion of URLs becomes even more useful when hypertext documents are used. A hypertext document is one which includes, within the document itself, links (pointers or references) to the document itself or to other documents. For example, in an on-line legal research system, each case may be presented as a hypertext document. When other cases are cited, links to those cases can be provided. In this way, when a person is reading a case, they can follow cite links to read the appropriate parts of cited cases.

In the case of the Internet in general and the World Wide Web specifically, documents can be created using a standardized form known as the Hypertext Markup Language (HTML). In, HTML, a document consists of data (text, images, sounds, and the like), including links to other sections of the same document or to other documents. The links are generally provided as URLs, and can be in relative or absolute form. Relative URLs simply omit the parts of the URL which are the same as for the document including the link, such as the address of the document (when linking to the same document), etc. In general, a browser program will fill in missing parts of a URL using the corresponding parts from the current document, thereby forming a fully formed URL including a fully qualified domain name, etc.

A hypertext document may contain any number of links to other documents, and each of those other documents may be on a different server in a different part of the world. For example, a document may contain links to documents in Russia, Africa, China and Australia. A user viewing that document at a particular client can follow any of the links transparently (i.e., without knowing where the document being linked to actually resides). Accordingly, the cost (in terms of time or money or resource allocation) of following one link versus another may be quite significant.

URLs generally have the following form (defined in detail in T. Berners-Lee et al, Uniform Resource Locators (URL), Network Working Group, Request for Comments: 1738, Category: Standards Track, December 1994, located at "http://ds.internic.net/rfc/rfc1738.txt", which is hereby incorporated herein by reference):

where "scheme" can be a symbol such as "file" (for a file on the local system), "ftp" (for a file on an anonymous FTP file server), "http" (for a file on a file on a Web server),and "telnet" (for a connection to a Telnet-based service). Other schemes, can also be used and new schemes are added every now and then. The port number is optional, the system substituting a default port number (depending on the scheme) if none is provided. The "host" field maps to a particular network address for a particular computer. The "url-path" is relative to the computer specified in the "host" field. A url-path is typically, but not necessarily, the pathname of a file in a web server directory.

For example, the following is a URL identifying a file "F" in the path "A/B/C" on a computer at "www.uspto.gov":

In order to access the file "F" (the resource) specified by the above URL, a program (e.g., a browser) running on a user's computer (i.e., a client computer) would have to first locate the computer (i.e., a server computer) specified by the host name. I.e., the program would have to locate the server "www.uspto.gov". To do this, it would access a Domain Name Server (DNS), providing the DNS with the host name ("www.uspto.gov"). The DNS acts as a kind of centralized directory for resolving addresses from names. If the DNS determines that there is a (remote server) computer corresponding to the name "www.upto.gov", it will provide the program with an actual computer network address for that server computer. On the Internet this is called an Internet Protocol (or IP) address and it has the form "123.345.456.678". The program on the user's (client) computer would then use the actual address to access the remote (server) computer.

The program opens a connection to the HTTP server. (Web server) on the remote computer "www.uspto.gov" and uses the connection to send a request message to the remote computer (using the HTTP scheme). The message is typically an HTTP GET request which includes the url-path of the requested resource, "A/B/C/F". The HTTP server receives the request and uses it to access the resource specified by the url-path "A/B/C/F". The server returns the resource over the same connection.

Thus, conventionally HTTP client requests for Web resources at an origin server 102 are processed as follows (see FIG. 2) (This is a description of the process when no reflector 108 is installed.): A1. A browser (e.g., Netscape's Navigator) at the client receives a resource identifier (i.e., a URL) from a user. A2. The browser extracts the host (origin server) name from the resource identifier, and uses a domain name server (DNS) to look up the network (IP) address of the corresponding server. The browser also extracts a port number, if one is present, or uses a default port number (the default port number for http requests is 80). A3. The browser uses the server's network address and port number to establish a connection between the client 106 and the host or origin server 102. A4. The client 106 then sends a (GET) request over the connection identifying the requested resource. A5. The origin server 102 receives the request and A6. locates or composes the corresponding resource. A7. The origin server 102 then sends back to the client 106 a reply containing the requested resource (or some form of error indicator if the resource is unavailable). The reply is sent to the client over the same connection as that on which the request was received from the client. A8. The client 106 receives the reply from the origin server 102.

There are many variations of this basic model. For example, in one variation, instead of providing the client with the resource, the origin server can tell the client to re-request the resource by another name. To do so, in A7 the server 102
sends back to the client 106 a reply called a "REDIRECT" which contains a new URL indicating the other name. The client 106 then repeats the entire sequence, normally without any user intervention, this time requesting the resource identified by the new URL.

System Operation

In this invention reflector 108 effectively takes the place of an ordinary Web server or origin server 102. The reflector 108 does this by taking over the origin server's IP address and port number. In this way, when a client tries to connect to the origin server 102, it will actually connect to the reflector 108. The original Web server (or origin server 102) must then accept requests at a different network (IP) address, or at the same IP address but on a different port number. Thus, using this invention, the server referred to in A3-A7 above is actually a reflector 108.

Note that it is also possible to leave the origin server's network address as it is and to let the reflector run at a different address or on a different port. In this way the reflector does not intercept requests sent to the origin server, but can still be sent requests addressed specifically to the reflector. Thus the system can be tested and configured without interrupting its normal operation.

The reflector 108 supports the processing as follows (see FIG. 3): upon receipt of a request, B1. The reflector 108 analyzes the request to determine whether or not to reflect the request. To do this, first the reflector determines whether the sender (client 106) is a browser or a repeater. Requests issued by repeaters must be served locally by the origin server 102. This determination can be made by looking up the network (IP) address of the sender in a list of known repeater network (IP) addresses. Alternatively, this determination could be made by attaching information to a request to indicate that the request is from a specific repeater, or repeaters can request resources from a special port other than the one used for ordinary clients. B2. If the request is not from a repeater, the reflector looks up the requested resource in a table (called the "rule base") to determine whether the resource requested is "repeatable". Based on this determination, the reflector either reflects the request (B3, described below) or serves the request locally (B4, described below). The rule base is a list of regular expressions and associated attributes. (Regular expressions are well-known in the field of computer science. A small bibliography of their use is found in Aho, et al., "Compilers, Principles, techniques and tools", Addison-Wesley, 1986, pp. 157-158.) The resource identifier (URL) for a given request is looked up in the rule base by matching it sequentially with each regular expression. The first match identifies the attributes for the resource, namely repeatable or local. If there is no match in the rule base, a default attribute is used. Each reflector has its own rule base, which is manually configured by the reflector operator. B3. To reflect a request, (to serve a request locally go to B4), as shown in FIG. 4, the reflector determines (B3-1) the best repeater to reflect the request to, as described in detail below. The reflector then creates (B3-2) a new resource identifier (URL) (using the requested URL and the best repeater) that identifies the same resource at the selected repeater. It is necessary that the reflection step create a single URL containing the URL of the original resource, as well as the identity of the selected repeater. A special form of URL is created to provide this information. This is done by creating a new URL as follows: D1. Given a repeater name, scheme, origin server name and path, create a new URL. If the scheme is "http", the preferred embodiment uses the following format:

By using a rule base (B2), it is possible to selectively reflect resources. There are a number of reasons that certain particular resources cannot be effectively repeated (and therefore should not be reflected), for instance: the resource is composed uniquely for each request; the resource relies on a so-called cookie (browsers will not send cookies to repeaters with different domain names); the resource is actually a program (such as a Java applet) that will run on the client and that wishes to connect to a service Java requires that the service be running on the same machine that provided the applet).

In addition, the reflector 108 can be configured so that requests from certain network addresses (e.g., requests from clients on the same local area network as the reflector itself) are never reflected. Also, the reflector may choose not to reflect requests because the reflector is exceeding its committed aggregate information rate, as described below.

A request which is reflected is automatically mirrored at the repeater when the repeater receives and processes the request.

The combination of the reflection process described here and the caching process described below effectively creates a system in which repeatable resources are migrated to and mirrored at the selected reflector, while non-repeatable resources are not mirrored.

Alternate Approach

Placing the origin server name in the reflected URL is generally a good strategy, but it may be considered undesirable for aesthetic or (in the case, e.g., of cookies) certain technical reasons.

It is possible to avoid the need for placing both the repeater name and the server name in the URL. Instead, a "family" of names may be created for a given origin server, each name identifying one of the repeaters used by that server.

For instance, if www.example.com is the origin server, names for three repeaters might be created: wr1.example.com wr2.example.com wr3.example.com

The name "wr1.example.com" would be an alias for repeater 1, which might also be known by other names such as "wr1.anotherExample.com" and "wr1.example.edu".

If the repeater can determine by which name it was addressed, it can use this information (along with a table that associates repeater alias names with origin server names) to determine which origin server is being addressed. For instance, if repeater 1 is addressed as wr1.example.com, then the origin server is "www.example.com"; if it is addressed as "wr1.anotherExample.com", then the origin server is "www.anotherExample.com".

The repeater can use two mechanisms to determine by which alias it is addressed: 1. Each alias can be associated with a different IP address. Unfortunately, this solution does not scale well, as IP addresses are currently scarce, and the number of IP addresses required grows as the product of origin servers and repeaters. 2. The repeater can attempt to determine the alias name used by inspecting the "host:" tag in the HTTP header of the request. Unfortunately, some old browsers still in use do not attach the "host:" tag to a request. Reflectors would need to identify such browsers (the browser identity is a part of each request) and avoid this form of reflection.

How a Repeater Handles a Request

When a browser receives a REDIRECT response (as produced in B3), it reissues a request for the resource using the new resource identifier (URL) (A1-A5). Because the new identifier refers to a repeater instead of the origin server, the browser now sends a request for the resource to the repeater which processes a request as follows, with reference to FIG. 5: C1. First the repeater analyzes the request to determine the network address of the requesting client and the path of the resource requested. Included in the path is an origin server name (as described above with reference to B3). C2. The repeater uses an internal table to verify that the origin server belongs to a known "subscriber". A subscriber is an entity (e.g., a company) that publishes resources (e.g., files) via one or more origin servers. When the entity subscribes, it is permitted to utilize the repeater network. The subscriber tables described below include the information that is used to link reflectors to subscribers. If the request is not for a resource from a known subscriber, the request is rejected. To reject a request, the repeater returns a reply indicating that the requested resource does not exist. C3. The repeater then determines whether the requested resource is cached locally. If the requested resource is in the repeater's cache it is retrieved. On the other hand, if a valid copy of the requested resource is not in the repeater's cache, the repeater modifies the incoming URL, creating a request that it issues directly to the originating reflector which processes it (as in B1-B6). Because this request to the originating reflector is from a repeater, the reflector always returns the requested resource rather than reflecting the request (Recall that reflectors always handle requests from repeaters locally.) If the repeater obtained the resource from the origin server, the repeater then caches the resource locally. If a resource is not cached locally, the cache can query its "peer caches" to see if one of them contains the resource, before or at the same time as requesting the resource from the reflector/origin server. If a peer cache responds positively in a limited period of time (preferably a small fraction of a second), the resource will be retrieved from the peer cache. C4. The repeater then constructs a reply including the requested resource (which was retrieved from the cache or from the origin server) and sends that reply to the requesting client. C5. Details about the transaction, such as the associated reflector, the current time, the address of the requester, the URL requested, and the type of response generated, are written to a local log file at the repeater.

Note that the bottom row of FIG. 2 refers to an origin server, or a reflector, or a repeater, depending on what the URL in step A1 identifies.

Selecting the Best Repeater

If the reflector 108 determines that it will reflect the request, it must then select the best repeater to handle that request (as referred to in step B3-1). This selection is performed by the Best Repeater Selector (BRS) mechanism described here.

The goal of the BRS is to select, quickly and heuristically, an appropriate repeater for a given client given only the network address of the client. An appropriate repeater is one which is not too heavily loaded and which is not too far from the client in terms of some measure of network distance. The mechanism used here relies on specific, compact, pre-computed data to make a fast decision. Other, dynamic solutions can also be used to select an appropriate repeater.

The BRS relies on three pre-computed tables, namely the Group Reduction Table, the Link Cost Table, and the Load Table. These three tables (described below) are computed off-line and downloaded to each reflector by its contact in the repeater network.

The Group Reduction Table places every network address into a group, with the goal that addresses in a group share relative costs, so that they would have the same best repeater under varying conditions (i.e., the BRS is invariant over the members of the group).

The Link Cost Table is a two dimensional matrix which specifies the current cost between each repeater and each group. Initially, the link cost between a repeater and a group is defined as the "normalized link cost" between the repeater and the group, as defined below. Over time, the table will be updated with measurements which more accurately reflect the relative cost of transmitting a file between the repeater and a member of the group. The format of the Link Cost Table is <Group ID> <Group ID> <link cost>, where the Group ID's are given as AS numbers.

The Load Table is a one dimensional table which identifies the current load at each repeater. Because repeaters may have different capacities, the load is a value that represents the ability of a given repeater to accept additional work. Each repeater sends its current load to a central master repeater at regular intervals, preferably at least approximately once a minute. The master repeater broadcasts the Load Table to each reflector in the network, via the contact repeater.

A reflector is provided entries in the Load Table only for repeaters which it is assigned to use. The assignment of repeaters to reflectors is performed centrally by a repeater network operator at the master repeater. This assignment makes it possible to modify the service level of a given reflector. For instance, a very active reflector may use many repeaters, whereas a relatively inactive reflector may use few repeaters.

Tables may also be configured to provide selective repeater service to subscribers in other ways, e.g., for their clients in specific geographic regions, such as Europe or Asia.

Measuring Load

In the presently preferred embodiments, repeater load is measured in two dimensions, namely 1. requests received by the repeater per time interval (RRPT), and 2. bytes sent by the repeater per time interval (BSPT).

For each of these dimensions, a maximum capacity setting is set. The maximum capacity indicates the point at which the repeater is considered to be fully loaded. A higher RRPT capacity generally indicates a faster processor, whereas a higher BSPT capacity generally indicates a wider network pipe. This form of load measurement assumes that a given server is dedicated to the task of repeating.

Each repeater regularly calculates its current RRPT and BSPT, by accumulating the number of requests received and bytes sent over a short time interval. These measurements are used to determine the repeater's load in each of these dimensions. If a repeater's load exceeds its configured capacity, an alarm message is sent to the repeater network administrator.

The two current load components are combined into a single value indicating overall current load. Similarly, the two maximum capacity components are combined into a single value indicating overall maximum capacity. The components are combined as follows:

The factor B, a value between 0 and 1, allows the relative weights of RRPT and BSPT to be adjusted, which favors consideration of either processing power or bandwidth.

The overall current load and overall maximum capacity values are periodically sent from each repeater to the master repeater, where they are aggregated in the Load Table, a table summarizing the overall load for all repeaters. Changes in the Load Table are distributed automatically to each reflector.

While the preferred embodiment uses a two-dimensional measure of repeater load, any other measure of load can be used.

Combining Link Costs and Load

The BRS computes the cost of servicing a given client from each eligible repeater. The cost is computed by combining the available capacity of the candidate repeater with the cost of the link between that repeater and the client. The link cost is computed by simply looking it up in the Link Cost table.

The cost is determined using the following formula:

In this formula, e is a very small number (epsilon) and K is a tuning factor initial set to 0.5. This formula causes the cost to a given repeater to be increased, at a rate defined by K, if its capacity falls below a configurable threshold.

Given the cost of each candidate repeater, the BRS selects all repeaters within a delta factor of the best score. From this set, the result is selected at random.

The delta factor prevents the BRS from repeatedly selecting a single repeater when scores are similar. It is generally required because available information about load and link costs loses accuracy over time. This factor is tunable.

Best Repeater Selector (BRS)

The BRS operates as follows, with reference to FIG. 6:

Given a client network address and the three tables described above: E1. Determine which group the client is in using the Group Reduction Table. E2. For each repeater in the Link Cost Table and Load Table, determine that repeater's combined cost as follows: E2a. Determine the maximum and current load on the repeater (using the Load Table). E2b. Determine the link cost between the repeater and the client's group (using the Link Cost Table). E2c. Determine the combined cost as described above. E3. Select a small set of repeaters with the lowest cost. E4. Select a random member from the set

Preferably the results of the BRS processing are maintained in a local cache at the reflector 108. Thus, if the best repeater has recently been determined for a given client (i.e., for a given network address), that best repeater can be reused quickly without being re-determined. Since the calculation described above is based on statically, pre-computed tables, if the tables have not changed then there is no need to re-determine the best repeater.

Determining the Group Reduction and Link Cost Tables

The Group Reduction Table and Link Cost Table used in BRS processing are created and regularly updated by an independent procedure referred to herein as NetMap. The NetMap procedure is run by executing several phases (described below) as needed.

The term Group is used here to refers to an IP "address group".

The term Repeater Group refers to a Group that contains the IP address of a repeater.

The term link cost refers to a statically determined cost for transmitting data between two Groups. In a presently preferred implementation, this is the minimum of the sums of the costs of the links along each path between them. The link costs of primary concern here are link costs between a Group and a Repeater Group.

The term relative link cost refers to the link cost relative to other link costs for the same Group which is calculated by subtracting the minimum link cost from a Group to any Repeater Group from each of its link costs to a Repeater Group. The term Cost Set refers to a set of Groups that are equivalent in regard to Best Repeater Selection. That is, given the information available, the same repeater would be selected for any of them.

The NetMap procedure first processes input files to create an internal database called the Group Registry. These input files describe groups, the IP addresses within groups, and links between groups, and come a variety of sources, including publicly available Internet Routing Registry (IRR) databases, BGP router tables, and probe services that are located at various points around the Internet and use publicly available tools (such as "traceroute") to sample data paths. Once this processing is complete, the Group Registry contains essential information used for further processing, namely (1) the identity of each group, (2) the set of IP addresses in a given group, (3) the presence of links between groups indicating paths over which information may travel, and (4) the cost of sending data over a given link.

The following processes are then performed on the Group Registry file.

Calculate Repeater Group Link Costs

The NetMap procedure calculates a "link cost" for transmission of data between each Repeater Group and each Group in the Group Registry. This overall link cost is defined as the minimum cost of any path between the two groups, where the cost of a path is equal to the sum of the costs of the individual links in the path. The link cost algorithm presented below is essentially the same as algorithm #562 from ACM journal Transactions on Mathematical Software: "Shortest Path From a Specific Node to All Other Nodes in a Network" by U. Pape, ACM TOMS 6 (1980) pp. 450-455, http://www.netlib.org/toms/562.

In this processing, the term Repeater Group refers to a Group that contains the IP address of a repeater. A group is a neighbor of another group if the Group Registry indicates that there is a link between the two groups.

For each target Repeater Group T: Initialize the link cost between T and itself to zero. Initialize the link cost between T and every other Group to infinity. Create a list L that will contain Groups that are equidistant from the target Repeater Group T. Initialize the list L to contain just the target Repeater Group T itself. Wile the list L is not empty. Create an empty list L' of neighbors of members of the list L. For each Group G in the list L: For each Group N that is a neighbor of G: Let cost refer to the sum of the link cost between T and G, and the link cost between G and N. The cost between T and G was determined in the previous pass of the algorithm; the link cost between G and N is from the Group Registry. If cost is less than the link cost between T and N: Set the link cost between T and N to cost. Add N to L' if it is not already on it. Set L to L'.

Calculate Cost Sets

A Cost Set is a set of Groups that are equivalent with respect to Best Repeater Selection. That is, given the information available, the same repeater would be selected for any of them.

The "cost profile" of a Group G is defined herein as the set of costs between G and each Repeater. Two cost profiles are said to be equivalent if the values in one profile differ from the corresponding values in the other profile by a constant amount.

Once a client Group is known, the Best Repeater Selection algorithm relies on the cost profile for information about the Group. If two cost profiles are equivalent, the BRS algorithm would select the same repeater given either profile.

A Cost Set is then a set of groups that have equivalent cost profiles.

The effectiveness of this method can be seen, for example, in the case where all paths to a Repeater from some Group A pass through some other Group B. The two Groups have equivalent cost profiles (and are therefore in the same Cost Set) since whatever Repeater is best for Group A is also going to be best for Group B, regardless of what path is taken between the two Groups.

By normalizing cost profiles, equivalent cost profiles can be made identical. A normalized cost profile is a cost profile in which the minimum cost has the value zero. A normalized cost profile is computed by finding the minimum cost in the profile, and subtracting that value from each cost in the profile.

Cost Sets are then computed using the following algorithm: For each Group G: Calculate the normalized cost profile for G Look for a Cost Set with the same normalized cost profile. If such as set is found, add G to the existing Cost Set; otherwise, create a new Cost Set with the calculated normalized cost profile, containing only G.

The algorithm for finding Cost Sets employs a hash table to reduce the time necessary to determine whether the desired Cost Set already exists. The hash table uses a hash value computed from cost profile of G.

Each Cost Set is then numbered with a unique Cost Sent Index number. Cost Sets are then used in a straightforward manner to generate the Link Cost Table, which gives the cost from each Cost Set to each Repeater.

As described below, the Group Reduction Table maps every IP address to one of these Cost Sets.

Build IP Map

The IP Map is a sorted list of records which map IP address ranges to Link Cost Table keys. The format of the IP map is:

where IP addresses are presently represented by 32-bit integers. The entries are sorted by descending base address, and by ascending maximum address among equal base addresses, and by ascending Link Cost Table key among equal base addresses and maximum addresses. Note that ranges may overlap.

The NetMap procedure generates an intermediate IP map containing a map between IP address ranges and Cost Set numbers as follows: For each Cost Set S: For each Group G in S: For each IP address range in G: Add a triple flow address, high address, Cost Set number of S) to the IP map.

The IP map file is then sorted by descending base address, and by ascending maximum address among equal base addresses, and by ascending Cost Set number among equal base addresses and maximum addresses. The sort order for the base address and maximum address minimizes the time to build the Group Reduction Table and produces the proper results for overlapping entries.

Finally, the NetMap procedure creates the Group Reduction Table by processing the sorted IP map. The Group Reduction Table maps IP addresses (specified by ranges) into Cost Set numbers. Special processing of the IP map file is required in order to detect overlapping address ranges, and to merge adjacent address ranges in order to minimize the size of the Group Reduction Table.

An ordered list of address range segments is maintained, each segment consisting of a base address B and a Cost Set number N, sorted by base address B. (The maximum address of a segment is the base address of the next segment minus one.)

The following algorithm is used: Initialize the list with the elements [-infinity, NOGROUP], [+infinity, NOGROUP]. For each entry in the IP map, in sorted order, consisting of (b, m, s), Insert (b, m, s) in the list (recall that IP map entries are of the form (low address, high address Cost Set number of S)) For each reserved LAN address range (b, m): Insert (b, m, LOCAL) in the list. For each Repeater at address a: Insert (a, a, REPEATER) in the list. For each segment S in the ordered list: Merge S with following segments with the same Cost Set Create a Group Reduction Table entry with base address from the base address of S, max address=next segment's base -1, group ID=Cost Set number of S.

A reserved LAN address range is an address range reserved for use by LANs which should not appear as a global Internet address. LOCAL is a special Cost Set index different from all others, indicating that the range maps to a client which should never be reflected. REPEATER is a special Cost Set index different from all others, indicating that the address range maps to a repeater. NOGROUP is a special Cost Set index different from all others, indicating that this range of addresses has no known mapping.

Given (B, M, N), insert an entry in the ordered address list as follows: Find the last segment (AB, AN) for which AB is less than or equal to B. If AB is less than B, insert a new segment (B, N) after (AB, AN). Find the last segment (YB, YN) for which YB is less than or equal to M. Replace by (XB, N) any segment (XB, NOGROUP) for which XB is greater than B and less than YB. If YN is not N, and either YN is NOGROUP or YB is less than or equal to B, Let (ZB, ZN) be the segment following (YB, YN). If M+1 is less than ZB, insert a new segment (M+1, YN) before (ZB, ZN). Replace (YB, YN) by (YB, N).

Rewriting HTML Resources

As explained above with reference to FIG. 3 (B5), when a reflector or repeater serves a resource which itself includes resource identifiers (e.g., a HTML resource), that resource is modified (rewritten) to pre-reflect resource identifiers (URLs) of repeatable resources that appear in the resource. Rewriting ensures that when a browser requests repeatable resources identified by the requested resource, it gets them from a repeater without going back to the origin server, but when it requests non-repeatable resources identified by the requested resource, it will go directly to the origin server. Without this optimization, the browser would either make all requests at the origin server (increasing traffic at the origin server and necessitating far more redirections from the origin server), or it would make all requests at the repeater (causing the repeater to redundantly request and copy resources which could not be cached, increasing the overhead of serving such resources).

Rewriting requires that a repeater has been selected (as described above with reference to the Best Repeater Selector). Rewriting uses a so-called BASE directive. The BASE directive lets the HTML identify a different base server. (The base address is normally the address of the HTML resource.)

Rewriting is performed as follows: F1. A BASE directive is added at the beginning of the HTML resource, or modified where necessary. Normally, a browser interprets relative URLs as being relative to the default base address, namely, the URL of the HTML resource (page) in which they are encountered. The BASE address added specifies the resource at the reflector which originally served the resource. This means that unprocessed relative URLs (such as those generated by Javascript.TM. programs) will be interpreted as relative to the reflector. Without this BASE address, browsers would combine relative addresses with repeater names to create URLs which were not in the form required by repeaters (as described above in step D1). F2. The rewriter identifies directives, such as embedded images and anchors, containing URLs. If the rewriter is running in a reflector, it must parse the HTML file to identify these directives. If it is running in a repeater, the rewriter may have access to pre-computed information that identifies the location of each URL placed in the HTML file in step F4). F3. For each URL encountered in the resource to be re-written, the rewriter must determine whether the URL is repeatable (as in steps B1-B2). If the URL is not repeatable, it is not modified. On the other hand, if the URL is repeatable, it is modified to refer to the selected repeater. F4. After all URLs have been identified and modified, if the resource is being served to a repeater, a table is appended at the beginning of the resource that identifies the location and content of each URL encountered in the resource. (This step is an optimization which eliminates the need for parsing HTML resources at the repeater.) F5. Once all changes have been identified, a new length is computed for the resource (page). The length is inserted in the HTTP header prior to serving the resource.

An extension of HTML, known as XML, is currently being developed. The process of rewriting URLs will be similar for XML, with some differences in the mechanism that parses the resource and identifies embedded URLs.

Handling Non-HTTP Protocols

This invention makes it possible to reflect references to resources that are served by protocols other than HTTP, for instance, the File Transfer Protocol (FTP) and audio/video stream protocols. However, many protocols do not provide the ability to redirect requests. It is, however, possible to redirect references before requests are actually made by rewriting URLs embedded in HTML pages. The following modifications to the above algorithms are used to support this capability.

In F4, the rewriter rewrites URLs for servers if those servers appear in a configurable table of cooperating origin server or so-called co-servers. The reflector operator can define this table to include FTP servers and other servers. A rewritten URL that refers to a non-HTTP resource takes the form:

where <scheme> is a supported protocol name such as "ftp". This URL format is an alternative to the form shown in B3.

In C3, the repeater looks for a protocol embedded in the arriving request. If a protocol is present and the requested resource is not already cached, the repeater uses the selected protocol instead of the default HTTP protocol to request the resource when serving it and storing it in the cache.

System Configuration and Management

In addition to the processing described above, the repeater network requires various mechanisms for system configuration and network management. Some of these mechanisms are described here.

Reflectors allow their operators to synchronize repeater caches by performing publishing operations. The process of keeping repeater caches synchronized is described below. Publishing indicates that a resource or collection of resources has changed.

Repeaters and reflectors participate in various types of log processing. The results of logs collected at repeaters are collected and merged with logs collected at reflectors, as described below.

Adding Subscribers to the Repeater Network

When a new subscriber is added to the network, information about the subscriber is entered in a Subscriber Table at the master repeater and propagated to all repeaters in the network. This information includes the Committed Aggregate Information Rate (CAIR) for servers belonging to the subscriber, and a list of the repeaters that may be used by servers belonging to the subscriber.

Adding Reflectors to the Repeater Network

When a new reflector is added to the network, it simply connects to and announces itself to a contact repeater, preferably using a securely encrypted certificate including the repeater's subscriber identifier.

The contact repeater determines whether the reflector network address is permitted for this subscriber. If it is, the contact repeater accepts the connection and updates the reflector with all necessary tables (using version numbers to determine which tables are out of date).

The reflector processes requests during this time, but is not "enabled" (allowed to reflect requests) until all of its tables are current.

Keeping Repeater Caches Synchronized

Repeater caches are coherent, in the sense that when a change to a resource is identified by a reflector, all repeater caches are notified, and accept the change in a single transaction.

Only the identifier of the changed resource (and not the entire resource) is transmitted to the repeaters; the identifier is used to effectively invalidate the corresponding cached resource at the repeater. This process is far more efficient than broadcasting the content of the changed resource to each repeater.

A repeater will load the newly modified resource the next time it is requested.

A resource change is identified at the reflector either manually by the operator, or through a script when files are installed on the server, or automatically through a change detection mechanism (e.g., a separate process that checks regularly for changes).

A resource change causes the reflector to send an "invalidate" message to its contact repeater, which forwards the message to the master repeater. The invalidate message contains a list of resource identifiers (or regular expressions identifying patterns of resource identifiers) that have changed. (Regular expressions are used to invalidate a directory or an entire server.) The repeater network uses a two-phase commit process to ensure that all repeaters correctly invalidate a given resource.

The invalidation process operates as follows:

The master broadcasts a "phase 1" invalidation request to all repeaters indicating the resources and regular expressions describing sets of resources to be invalidated.

When each repeater receives the phase 1 message, it first places the resource identifiers or regular expressions into a list of resource identifiers pending invalidation.

Any resource requested (in C3) that is in the pending invalidation list may not be served from the cache. This prevents the cache from requesting the resource from a peer cache which may not have received an invalidation notice. Were it to request a resource in t