United States Patent7210145
SrinivasanApril 24, 2007

Title

Technology for integrated computation and communication; TICC

Abstract

TICC manages asynchronous communications among groups of concurrent (parallel) processes (Objects) in multiprocessors. It dedicates one processor to function as Communications Processor, C. TICC defines a Causal Communication programming Language, called CCL, whose statements may appear intermixed with statements of any conventional programming language. TICC defines methods for compiling CCL statements in these mixed language programs into sequences of protocols which are executed by C, in parallel with on going computations guaranteeing (1) group-to-group loss less, buffer less, self-synchronizing asynchronous data transfers; (2) more than a hundred fold reduction in communication latencies; (3) Dynamic Flexibility to monitor, repair, reconfigure and update software objects without service interruption; and (4) protection and security of all data, distributed to or collected from communicating objects, based on an Agreement Protocol.


Inventors:Srinivasan; Chitoor V. (Port Saint Lucie, FL)
Assignee:EDSS, Inc. (Port St. Lucie, FL)
Appl. No.:10/265,575
Filed:October 7, 2002
PCT Pub Date:May 2, 2007

Current U.S. Class:718/100 719/313 719/314 719/317 709/202 709/203 709/234 709/239 709/244 717/107 717/140 717/168 
Current International Class:G06F 15/16 (20060101) G06F 9/45 (20060101) G06F 9/46 (20060101) G06F 9/48 (20060101) G06F 9/54 (20060101)
Field of Search:717/100-178 719/310-332

U.S. Patent Documents
5297285March 1994Abrahamsson et al.
5339430August 1994Lundin et al.
5386568January 1995Wold et al.
5410703April 1995Nilsson et al.
5488723January 1996Baradel et al.
5551035August 1996Arnold et al.
5640546June 1997Gopinath et al.
6389483May 2002Larsson
6751787June 2004Blaszczak et al.
6789257September 2004MacPhail
6807583October 2004Hrischuk et al.
6912718June 2005Chang et al.
7003777February 2006Hines
Other References
Birman, Kenneth et al. "Lightweight Causal and Atomic Group Multicast." ACM Transactions. Aug. 1991. cited by examiner .
Das, Souripriya. "RESTCLK: A communication Paradigm for Observation and Control of Object Interactions." PH.D Dissertation, Department of Computer Science, Rutgers University. Jan. 1999. DCS-TR-450. cited by exa- miner .
von Eicken, T. et al. "Active Messages: A Mechanism for Integrated Communication and Computation." May 1992. cited by examiner .
Steele, Guy L. "Common LISP", The Language, Second Edition. Chapters 1-7, 15 and 20. Butterworth-Heinemann. 1990. cited by examiner .
Wilensky, Robert. "LISPcraft". W.W. Norton & Company. Chapters 1-4, 7-9 and 19. 1984. cited by examiner.~
Primary Examiner: Bullock, Jr.; Lewis A.

Parent Case Text



CROSS REFERENCE TO RELATED APPLICATIONS

This is the non-provisional of the provisional application, entitled "Technology for buffer less self-synchronizing peer-to-peer message passing in distributed computing systems",

Application No. 60/329,454 filed on Oct. 15, 2001;

Confirmation Number 3551, Dated Nov. 02, 2001.

Foreign Filing License Granted, Nov. 1, 2001, under Title 35, United States Code, Section 184; & Title 37, Code of Federal Regulations 5.11 & 5.15.

Claims


What is claimed is:
1. In a software for a parallel processing application, A, composed of software objects (programs) called cells, cells together constituting the program of parallel processing application, a method for using the new Technology for Integrated Computation and Communication (TICC) to specify and implement guaranteed asynchronous high-speed buffer-free parallel communications among cells, without assistance from an operating system, this being the primary characteristic of integration of computation and communication; each cell being a program or a canonical representation of a program containing at least six ports, said cell being the parent-cell of said ports attached to the said cell, wherein cell-to-port attachments enabling said port and said cell to exchange data with each other and ports being themselves software objects; cells running in parallel with each other on multiple processors with each cell running in its own assigned processor or some cells running concurrently in processors that they share with other cells, at least one of the processors being dedicated as communications processor, each communications processor being associated with one or more cells, being called client-cells of said communication processor and said communications processor being called communications-server of each said client-cell, no client-cell having more than one communications-server associated with it; communications processors enabling cells to communicate with each other in parallel asynchronously, using their respective ports via specially established pathways that connect message sending ports to message receiving ports, each said pathway containing its own unique pathway-memory to hold a message that is being delivered, each pathway containing a finite number of agents to coordinate and synchronize message transfers and enforce security, pathways, agents and pathway-memories being themselves software objects defined by programs; agents on said pathway being organized into a ring around pathway-memory of said pathway wherein each agent has a next agent, the next agent of the last agent being the first agent, this ring being referred to as agent-ring of said pathway; models of TICC-sequential and TICC-parallel computations being such that no port would receive a second message while its parent cell was still working on the first received message so that no pathway-memory ever contains a buffer with message queue of more than one message, thus making it possible to have communications, free of buffers holding message queues, hereafter referred to as buffer-free communications; message sending ports being connected to one of the agents on the agent-ring of pathway, said message-receiving ports being similarly connected to another agent on said agent-ring of said pathway, said another agent being next to the agent to which said message sending ports were connected, each message sending port sending a signal to the agent it is connected, after receiving a signal from its own parent-cell to send off a message in pathway-memory of said pathway to said message receiving ports, said agent then sending a signal to its next agent and said next agent then broadcasting signals to all ports in said message receiving port-group, each port in said message receiving port group then posting a new message receipt signal on its parent-cell, thus causing message in said pathway-memory to be delivered to parent-cells of ports in said message receiving ports; protocols for cell to port, port to agent, agent to agent, agent to port and port to cell signal exchanges being executed by said communications processors, without assistance from said operating system, causing signals to travel along a pathway from said message sending ports to said message receiving ports and eventually establishing a context in which parent-cells of message receiving ports may receive and respond to messages in the pathway-memory of said pathway; The method comprising the following steps: dynamically allocating real memories to pathway-memories; specifying port-groups containing ordered collections of one or more distinct ports such that no two ports in a port-group have the same parent-cell, the number of parent-cells of ports in a port-group being the same as the number of ports in that port-group, each port-group thus forming a corresponding cell-group consisting of ordered collection of parent-cells of ports in that port-group, said cell-group being always associated with said port-group; dynamically establishing pathways between pairs of distinct port-groups, and for some other port-groups, if any, establishing a self-loop pathway from each of the said other port-group to itself, always making sure that each port in port-group is connected to only one pathway, said pathways and said port-groups together constituting the software network of parallel processing application; cells exchanging messages and responding to received messages as they receive them, in parallel with each other in said multiple processors with designated communication processors, using pathways in the software network to exchange messages, the said program of application together with said software network constituting the application system; dynamically modifying existing pathways in said software network of application A without service interruption and without loss of data flowing through pathways in the said software network of application A, while cells in said program of application A are running their respective programs in parallel; in each pathway in the network of application A connecting a message sending port-group to a message receiving port-group, tuning each port in said message sending and message-receiving port-groups to its parent-cell, and tuning ports in said message sending and said message receiving port groups to agents they are respectively connected to, said tuning causing each (port, parent-cell) and (port, agent) pair on said pathway to always listen to each other every time they exchange signals, no parent-cell, port or agent wasting time waiting for synchronization in order to send signals, thus facilitating high-speed message delivery, and no parent-cell, port or agent ever missing to receive and respond correctly and promptly to received signals, thus guaranteeing message delivery across said pathway from said message sending ports to said message receiving ports; specifying group-to-group parallel communications among port-groups of said software network of application A, through use of Causal Communication Language (CCL) statements, which may be freely intermixed with any conventional programming language statements used to define programs of cells, ports and agents, CCL statements appearing in program of cell specifying signal exchange between said cell and its ports, CCL statements appearing in program of port specifying signal exchange either between said port and its parent-cell or between said port and agent tuned to said port, CCL statements appearing in an agent specifying signal exchange either between said agent and port tuned to said agent or between said agent and its next agent; building a CCL-sequence for each pathway in said software network and for each port connected to said pathway that may send messages via said pathway, being hereafter called a sending-port, each such CCL-sequence being a concatenation of CCL statements appearing in cells, ports and agents in said application system, whose execution would cause signals to be transmitted from the parent-cell of said sending-port to parent-cells of all ports in message receiving port-group connected to said pathway, CCL statements appearing in said CCL-sequence of said sending-port being in the following order: first CCL statement specifying signal to be sent by parent-cell of said sending-port to said sending-port itself, second CCL statement specifying signal to be sent from said sending-port to agent tuned to said sending-port, third CCL statement specifying signal to be sent from said agent to its next agent on the agent-ring of pathway connected to said sending-port, fourth CCL statement specifying signals to be broadcast from said next agent to all ports in the message receiving port-group of said pathway, and the last CCL statement specifying signals to be sent from each port in said message receiving port-group to its parent-cell, there being one such CCL-sequence for each sending-port in said software network, execution of protocols associated with all CCL-sequences of all sending-ports in a message sending port-group, in the order CCL statements appear in said CCL-sequences, causing message in pathway-memory of pathway connected to said message sending port-group to be delivered to all parent-cells of ports in the message receiving port-group of said pathway for each message sending-port in a message sending port-group presenting to communications-server of its parent-cell the CCL-sequence of said sending-port, when message in pathway-memory of pathway connected to said sending-port is ready to be sent, said communications-server executing CCL statements in each CCL-sequence presented to it, in the order CCL statements appear in said CCL-sequence, communications-server executing without fail every CCL-sequence presented to it in order they were presented, thus causing messages sent by every sending-port in every message sending port-group to be delivered to its corresponding message receiving port-group; each parent-cell of each port in said message receiving port-group, when activated to run the program of said parent-cell in a processor of said multiprocessor system, polling ports of said parent-cell in a predetermined cyclic order one port at a time to check for presence of a new message delivered to said port, cyclic polling continuing until said parent-cell terminates its operations based on satisfaction of predefined conditions or based on receipt of an interrupt signal from another cell, for any port of said parent-cell polling said port being in one of two modes: responding immediately to delivered new message at the polled port and then proceeding to poll its next port, if new message is indeed present at the polled port at the time of polling said polled port, else proceeding immediately to poll the next port of said polled port, if no new message is present at the polled port at the time of polling said polled port, or waiting at said polled port for a new message to arrive, responding to said new message when it arrives and then polling the next port of said polled port; parent-cell of each port in a message receiving port-group hereinafter called the receiving-port, responding to received new message at said receiving-port, only after all parent-cells of all ports in said message receiving port-group had completed all of their respective computations triggered by receipt of said new message and had written their respective response messages in the pathway-memory of pathway connected to said message receiving port-group, messages so written in said pathway-memory by all said parent-cells of ports in the said message receiving port-group being the joint-response-message, parent-cell of each said receiving-port then presenting CCL-sequence of said receiving-port to communications-server of said parent-cell, thereby causing said joint-response-message to be eventually delivered back to cells that sent said new message; and each pathway-memory supporting computations performed by parent-cells of ports tuned to said pathway-memory while parent-cells are responding either to message received from said pathway-memory, by providing execution environment for programs that respond to said received message, so that said message in said pathway-memory need not be copied in order for said parent-cells to respond to said message.

2. A method as recited in claim 1 further including the following steps for defining each port by: tuning each port to the pathway-memory of pathway connected to that port, with restriction that no two ports of a cell could be tuned to the same pathway-memory, to enable said port to give read/write access to said pathway-memory only to its parent-cell, there by protecting said pathway-memory; specifying pathway establishment security conditions for each port, said port being responsible to check satisfaction of specified pathway establishment security conditions before allowing a pathway to be established and connecting said port to that pathway; allowing said pathway to be established only if pathway connecting said port-group to another port-group does not violate specified pathway establishment security conditions, reporting in a predetermined manner attempts made to establish pathways that violated said specified pathway establishment security conditions; gathering and reporting data on their performance to a specialized System-Doctor-Cell running in its own dedicated processor, in parallel with cells, to facilitate dynamic monitoring of application system performance, reporting time-outs and poor performance, and thus providing infrastructure for implementing programs for self-diagnosis and self-repair; and activating its parent-cell on receipt of a new message, if said parent-cell had not been already activated, and thereafter each port to efficiently cause appropriate thread to be scheduled and executed without assistance from said operating system, based on said new message class, in order to correctly respond to said new message.

3. A method as recited in claims 1 or 2 further including following steps for defining each agent by: specifying for each connected (agent, port) pair message security conditions to be enforced by said (agent, port) pair, so that no message is delivered by said agent to said port unless specified message security conditions are satisfied; enforcing specified message security conditions by dynamically monitoring security status so that no port that is tuned to an agent receives message from any other port in violation of specified message security conditions, and maintaining and reporting appropriate records on all violations of specified message security conditions when they occur; tuning agents to ports in a message sending port-group connected to a pathway, to synchronize and coordinate messages sent out in parallel by parent-cells of ports in said sending port-group making sure no message would be sent out until parent-cells of all ports in said sending port-group had fully completed their respective computations and had fully written out their respective messages into said pathway-memory of pathway connected to said sending port-group, and delivering messages sent out by said parent-cells exactly once to each port in the receiving port-group connected to said pathway, even though each parent-cell of each port in said sending port-group sends messages in parallel, independent of each other; and tuning agents to ports in a message receiving port-group connected to a pathway, to synchronize and broadcast message in pathway-memory of said pathway to every port in said receiving port-group in each cycle of computation, said agent being responsible to protect and preserve data in pathway-memory of said pathway until parent-cells of all ports in said receiving port-group had fully received the delivered message and responded to that message.

4. Method as recited in claims 1, 2 or 3, further including the following steps: installing new cells with ports attached to them, hereinafter referred to as monitor cells, ports of monitor cells being referred to as monitor-ports, defining monitor-port-groups consisting of ordered collections of distinct monitor-ports of distinct monitor-cells, no monitor-port-group containing more than one monitor-port of any monitor-cell, installing pathways among selected pairs of monitor-port-groups, installing isolated new agents not connected to any pathway, said new agents being called monitor-agents, selecting monitor-port-groups not already connected to a pathway and connecting each selected monitor-port-group to an already installed monitor-agent, tuning said monitor-agent to monitor-ports in said monitor-port-group connected to said monitor-agent, the network consisting of monitor-cells together with all installed pathways and installed monitor-agents being the monitor-network; dynamically attaching monitor-agents in said monitor-network to selected points in pathways of said software network, the selected points being points next to an agent in an agent-ring of a pathway already in the software network or a point in the part of a pathway in the software network that connects an agent in said pathway to a port it is tuned to, or connecting directly and tuning a monitor-port not connected to any pathway or any monitor-agent in said monitor-network, to an agent in a pathway of software network, all of this being done without service interruption, the monitor-network thus installed on a software network becoming a part of said software network enlarging the said software network, each monitor-cell in said monitor-network operating in parallel with all other cells in said enlarged software network, each in its own designated processor of a multiprocessor; for each port in the software network that may send out a message, referred to as the sending-port, dynamically re-computing CCL-sequence of said sending-port as and when needed, to correctly route message that is being sent via pathway connected to said sending-port via monitor network that may be attached to pathway connected to said sending-port; and dynamically installing as many monitor-networks on a software network as necessary, as and when needed, allowing for the possibility of installing monitor-networks on other monitor-networks already installed on a software network.

5. A method as recited in claim 4 further including the following steps: dynamically installing monitor network on agents tuned to port-groups in a software network, in a manner such that computations performed by parent-cells of said port-groups tuned to said agents may be temporarily suspended and data associated with suspended parent-cells may be selectively examined for the purpose of debugging an application program; dynamically installing monitor network consisting of programs needed to reconfigure an application system according to specified reconfiguration criteria, without service interruption and without loss of any of the benefits already obtained from computations done up to the time of reconfiguration, if such reconfiguration was consistent with correct operation of application system; and dynamically installing monitor network needed to test new versions of cells in exactly the same context in which their older versions function, new versions receiving exactly the same input messages that their older versions receive in every cycle of computation, in parallel with said older versions, and replace said older versions with their corresponding said new versions, after testing has been successfully completed, all without service interruption, thus making possible dynamic evolution of application systems.

6. A method as recited in one of claims 1 through 5 above, implemented in hardware or software, including the following steps: using agents, ports, or their equivalents, to compile CCL statements or their equivalents to sequences of protocols and presenting protocols to one or more dedicated communications processors, which execute presented protocols in parallel with computations performed by cells and cause messages to be delivered to their intended recipients via already established pathways in said software network; specifying and enforcing pathway and message security conditions; dynamic monitoring, debugging, reconfiguring and updating of an application system, consisting of cells that constitute the program of application and pathways that constitute the software network of application, in parallel with ongoing parallel computations being performed by cells in a multiprocessing computer with multiplicity of processors, using dynamically installed monitor-networks; and using ports, and System-Doctor-Cell running on a dedicated processor, to detect possible malfunctions, start automatic diagnosis, and repair procedures, if such procedures had been already designed and built into application system, or else report possible malfunctions in a predetermined manner.

Description

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

The work that led to this patent application was not supported by any federally sponsored research. The inventor himself supported this work and inventor enjoys all rights to ownership and use of the patent claimed in this application.

FIELD OF THE INVENTION

These inventions relates generally to asynchronous communications in software systems and, more specifically to asynchronous communications in massively parallel software systems. It relates to a methodology for specifying, compiling and executing communication protocols in parallel with computations in multiprocessor systems, guaranteeing buffer less, loss less, low latency, self-synchronizing asynchronous data exchange among groups of software objects, with extremely low latencies.

COPYRIGHT NOTICE

Some of the figures in this patent disclosure are subject to copyright protection. The copyright owner has no objection to facsimile reproduction of any of the figures, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION

This work pertains to asynchronous communications among parallel/concurrent processes running in multiprocessing systems. The invention looks ahead to the fast approaching era, when 3 to 5 gigahertz CPUs and gigabyte bandwidth communications would be commonplace, and cost of CPUs would have dramatically decreased. Multiprocessors with 100 to 1000 CPUs might become feasible. A great impediment to their effective use under current technologies would be the cost of communication latencies. In current technology there are two paradigms for communication among parallel or concurrent processes with in a multiprocessor (whether distributed or shared memory): (i). By conducting synchronization sessions between sending and receiving processes, and using input buffers to hold data/messages until recipients can attend to them. This is very time consuming and can also put a significant drain on available memory and processing resources. This can take as much as 5,000 to 10,000 CPU cycles for point-to-point asynchronous message exchange between two communicating parallel processes. (ii). Through programmed periodic punctuated synchronous message exchange sessions, in which all the parallel processes are halted in a mutually consistent state, and each process is programmed to exchange data/messages with other processes in a synchronized a prior planned manner, thus eliminating the need for asynchronous communications. This can, however, result in inefficient utilization of available parallel processing resources. Yet, this gives better efficiencies than method (i) above.

TICC introduces a radically new paradigm; it is a paradigm of asynchronous communications in parallel with computations. TICC defines a Causal Communication programming Language, CCL, which can be freely intermixed with any computation specification language. CCL may be used to specify communications among groups of parallel/concurrent processes. TICC specifies a method for compiling statements in CCL to sequences of protocols, which are executed by a dedicated communications processor, C, in parallel with computations, guaranteeing loss less, self-synchronizing buffer free data exchange among groups of parallel processes via already established TICC pathways.

In addition, TICC provides unprecedented capabilities to dynamically monitor, debug and update parallel programs. There are several other advantages to using TICC. They are explained in this patent disclosure.

TICC was inspired by the communication paradigm, called RESTCLK [Das 1999] that was first introduced by my student Dr. Souripriya Das in 1999, while both of us were in the Computer Science Department of Rutgers, The State University of New Jersey, New Brunswick, N.J. As a doctoral student Dr. S. Das started his research with Prof. B. Gopinath and Prof. Chitoor V. Srinivasan as his doctoral thesis advisers. Prof. B. Gopinath was then professor of Computer Engineering in the Department of Electrical and Computer Engineering at Rutgers. The system concepts of RESTCLK and techniques for their implementation were developed under the guidance of Dr. B. Gopinath When Prof. Gopinath left Rutgers University, sometime in 1996, Dr. S. Das continued his work with Prof. Chitoor V. Srinivasan to complete his doctoral dissertation requirements. During this time Prof. Srinivasan and Dr. Das jointly developed the theoretical framework for RESTCLK, which established the structure and properties of RESTCLK. A proof of concept implementation of RESTCLK was successfully developed and demonstrated by Dr. Das in the summer of 1998. Professor Srinivasan retired from Rutgers University in January 1999.

RESTCLK, as it was defined, could not be efficiently implemented and it did not guarantee loss less message delivery. This made it Impossible to deploy RESTCLK for use in applications. Indeed, Prof. Gopinath abandoned his efforts to patent the ideas in RESTCLK (please see reference in U.S. Pat. No. 5,640,546, to application Ser. No. 08/021,096, filed on Feb. 23, 1993, and abandoned). RESTCLK remained essentially a research tool. The modified communication paradigm, TICC, which executes communication protocols in parallel with computations, overcomes this limitation of RESTCLK. TICC guarantees loss less, buffer less, self-synchronizing asynchronous message delivery with extremely low latencies. TICC now makes it possible to fully exploit on a commercial scale, the dynamic flexibility feature (ability to dynamically monitor, diagnose, repair, reconfigure and update parallel software systems without service interruption), which RESTCLK attempted to bring to object centered application systems.

SUMMARY

1 Introduction

If `Network is the Computer` then Communication is at the heart of computation. Unless the two are seamlessly integrated the full potential of Network as the Computer will never be realized. TICC provides a way of seamlessly integrating the two.

The dichotomy between computation and communication in our current software technology is an artificial one. As explained later below, it does not exist in hardware systems. The history of development of our programming technology has been largely responsible for it. Communication has always been outside the scope of programs: Input/Output is done by hardware. It is not a part of even our theoretical models of sequential programming. Assignments, If/Then Statements, While Statements and conventions for program control are the sole constituents of our theoretical models of sequential programs. (See "Discrete Mathematics and its Applications", by Kenneth H. Rosen, Third Edition, McGraw Hill Inc. 1995, pp 217 223 and pp 694 701.) Communication and coordination are not a part of them. Turing Machine provides a model of sequential computations. There is no similar model for parallel computations. Multi-tape Parallel Turing Machine does not model inter-process communications. It assumes a totally synchronous single multi-tape machine and thus avoids problems of asynchronous communication. There has been no integrated conceptualization of communication and computation as a unified entity. This has largely stymied our progress in realizing our dream of Network as the Computer.

The reasons are quite simple. A necessary requirement for communication to occur in a timely manner is that receivers and senders should hold a certain synchrony: Receivers should listen to senders at right times and should fully absorb messages. Among parallel processes that run in different processors, clearly such a synchrony does not naturally exist. Even among concurrent processes that run in the same processor such synchrony does not exist. In addition to this lack of synchrony among parallel and concurrent processes, there is lack of synchrony also between communication events and computational events. To bridge the gap caused by this lack of synchrony one uses protocols, synchronization sessions and buffers for asynchronous message passing, or programmed punctuated synchronous data exchange sessions in parallel computations. But, usually these are not viewed as being essential parts of computations. They are viewed as necessary evils. One has to suffer them in order to accomplish ones computational objectives. Sometimes they are even hidden from programmers.

2. Limitations of Current Technology

As of this writing, 3-Gigahertz CPUs and 10-Gigabits/sec networks are already here. Massive Concurrent (parallel) Computing faces serious problems in this new age of 3-gigahertz or more CPUs and 10-Gigabits/sec or more networks. The following three stand out as being most critical: Communication Bottleneck, Debugging Bottleneck and Memory Bottleneck. Communication bottleneck is caused by high latency in both local (within a multiprocessor) and remote (between multiprocessors) asynchronous communications. Whereas advances in technology will surely contribute to latency reduction in remote asynchronous communications, latency In local communications between concurrent (parallel) processes will continue to remain high.

This is because, the primary reason for latency in local asynchronous communications is lack of synchrony between communication and computation events referred to earlier. To overcome problems caused by this lack of synchrony one has to continue to use synchronization sessions and buffers. These add to latency. Debugging bottleneck is caused by not having adequate tools to dynamically debug parallel processes. This will continue to be a formidable, time-consuming problem. Memory bottleneck is caused by limitations in available memory bandwidth. Data cannot be fetched from memory at rates adequate to feed the needs of all active parallel processes. Advances in technology that increase available memory bandwidth cannot by themselves solve this problem.

TICC provides clear solutions to the first two of above three problems and has the potential to solve also the third problem. TICC manages asynchronous local (shared memory) communications between parallel and/or concurrent processes within a multiprocessor. Nature of innovation in TICC and its benefits are introduced next section.

3. Nature of Innovation in TICC.

3.1. Hardware Analogy

Nature of innovation in TICC is best explained through the following analogy with hardware systems. Consider a synchronous clock driven logic circuit. Gates in the circuit will listen to their inputs when clock signal is high and perform their computations when clock signal is low. New inputs will appear at a gate only after it has completed its computations on its current inputs. In this way clock pulses enforce synchrony between computational events and communication events in synchronous circuits. It is almost axiomatic to characterize asynchronous communication as one that necessarily requires buffers or input queues. This, however, is not the case in asynchronous hardware circuits. In asynchronous circuits, one can replace clock pulses with suitable self-synchronizing two-way control pulses and make the gates listen to their Input signals at right times and send completion signals back. Here also new input signals will appear at a gate only after it has completed computations on its current inputs. In both cases, clock pulses from a master clock or self-synchronizing two-way control pulses, will permeate through every logical unit in a hardware system enforcing synchrony between computational and communication events. A structural isomorphism will hold between synchronizing pulse distribution networks and interconnection networks of logical units In hardware systems. This will eliminate the need for buffers and synchronization sessions for correct message (data) dispatch and delivery. Communications and computations will flow together as a single integrated self-synchronized organic process. Thus the dichotomy between communication and computation, mentioned earlier, does not exist in hardware systems.

The invention in TICC is that it employs self-synchronizing software message pathways (called TICC-pathways), which function like self-synchronizing signal lines in hardware systems and like back-panel data buses in supercomputer systems. These pathways are used to asynchronously exchange data among groups of parallel/concurrent processes. Just as asynchronous signal lines enforce synchrony between communication and computational events in hardware systems, self-synchronizing message pathways also enforce synchrony between computation and communication events, at the level of Objects. New inputs will appear at input ports of software objects only after they have completed their computations on their current inputs. These characteristics of TICC will eliminate the need for buffers (input queues) and synchronization sessions.

Just as asynchronous signal lines in a hardware systems hold a structural isomorphism between control signal distribution networks and logical unit interconnection networks, the collection of TICC-pathways in a parallel application systems will hold a structural isomorphism between TICC-pathway control signal distribution networks and sequential and parallel computational networks at objects level.

3.2 Properties of TICC

TICC shares the following properties with its hardware analog, the back-panel bus: Connection Oriented: Only groups of objects with specially established TICC-pathway connections between them might exchange messages. Self-Synchronizing: Receivers need not be informed a priori and prepared in order to listen to and fully absorb asynchronous messages sent to them. No synchronization sessions are necessary (true for RESTCLK). Guaranteed Buffer-free One Time Message Delivery: No message will be lost (not true for RESTCLK). Reduced Communication Costs: Dramatically reduces communications latency almost a hundred fold (not true for RESTCLK). Prevents Message Interference: No message recipient will receive two or more messages at the same time (true for RESTCLK).

In addition TICC does the following: a Enforces Correct synchronization and coordination (not true for RESTCLK because it does not guarantee loss less message delivery) of computational and communication events throughout a distributed parallel computational system. Provides Dynamic flexibility (true for RESTCLK but could not be exploited on a commercial scale) with practically unlimited capability to efficiently and dynamically monitor, diagnose, repair, reconfigure and update parallel computational systems without service interruption. Integrates communication and computation (not true for RESTCLK) by defining a causal communications programming language, the TICC-language, which can be freely combined with any conventional programming language. Defines methods for efficiently compiling CCL statements (not true for RESTCLK) specified in a unified computation and communication language into sequences of protocols which are executed in parallel with computations by a dedicated communications processor.

3.3 The most Important Features of TICC

The four most important features of TICC are the following: 1) Self-synchronizing: No synchronization sessions are necessary. Provides buffer free loss less message distribution, with dramatic reductions in communication latency (not true for RESTCLK), 2) Structural isomorphism: Control signal distribution network in TICC consisting of TICC-pathways will be structurally isomorphic to sequential/parallel computational structures of application systems (true for RESTCLK), and 3) Dynamic Flexibility: The word dynamic is used here in the following technical sense: Changes may be introduced in to application systems without service interruption and without loss of benefits gained from prior computations. This is a consequence of the above two properties (true for RESTCLK but could not be efficiently implemented). 4) Independence: (true for RESTCLK but could not be exploited effectively): TICC based application systems enjoy dynamic flexibility in a manner that is totally independent of code embedded in application objects. Thus facilities needed to diagnose and repair failures, reconfigure a system, or make updates may all be dynamically introduced in to an application system, at any time during its lifetime, whether they have been planned for in advance or not.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1: This introduces the various components of a TICC pathway and shows how the components are put together to form a pathway that enable two groups of objects to communicate with each other, as explained in Chapter 1 of Detailed Description.

FIG. 2: This figure shows the organization of pending lists associated with multiprocessors, X.sub.1, X.sub.2, . . . X.sub.N and shows how segments S.sub.ij of protocol functions are executed by the segment driver, as explained in Chapter 1 of Detailed Description.

FIG. 3: This figure shows how monitor agents and monitor objects may be attached to delivery and dispatch sides of a watch-ring to monitor input and output data flowing into and out of an object port.

FIG. 4: Illustrates the TICC model of sequential computations.

FIG. 5: Illustrates the TICC model of parallel computations.

FIG. 6: Shows how monitor agents may be attached to a clock-ring to monitor data flowing into and flowing out of a group of objects.

FIG. 7: Illustrates how a probe network may be attached to an agent, in order to suspend computations being performed by objects O.sub.1 and O.sub.2 in the figure.

FIG. 8: Illustrates a typical TICC pathway structure suitable for dynamic reconfiguration of an application system.

FIG. 9: Illustrates the pathway structure that may be used to support dynamic evolution.

FIG. 10: Tracing a protocol segment from a sending port to receiving ports via agents on a clock-ring.

FIG. 11: Tracing a protocol segment from a port to a monitor port, via monitor agent on dispatch side of a watch-ring.

FIG. 12: Tracing a protocol segment from a port to monitor ports on delivery side of a watch-ring.

FIG. 13: Tracing a protocol segment from a dispatch-side monitor port to delivery-side monitor-ports.

FIG. 14: Tracing a protocol segment from a delivery-side monitor port to an object port.

FIG. 15: Tracing protocol segments from monitor port to monitor port on the same side of a watch-ring.

DETAILED DESCRIPTION

We begin the detailed description with a conceptual overview of TICC. Implementation details that produce a practical and efficient implementation of TICC, fully realizing all its potential benefits are introduced next in Chapter 2 through a design for implementation, stated in the LISP programming language.

Chapter 1: Conceptual Overview of TICC

1.1.Overview of TICC Organization

TICC assumes an object centered software environment. It assumes active objects, i.e., each object would contain its own threads. Groups of objects in an application system may communicate with each other in parallel. Communication in TICC is connection oriented: One group of objects may communicate with another group if and only if there are TICC-pathways connecting them. TICC specifies a causal communication language, CCL. It has two parts: Its first part contains command statements, which are part of an API (Application Programmer Interface) for TICC. These are used by application objects to dynamically create, deploy, and modify networks of TICC-pathways used by application systems. Details on syntax of API command statements are not important and are thus not specified here. What is important is to specify how TICC-pathway creation, deployment, and maintenance tasks can be implemented. This is shown in Chapter 2.

The second part consists of causal communication statements of the form X(x).fwdarw.Y; (1) which may be read as "cause X to send signal x to Y," where X and Y may be application objects, or TICC ports and agents. If signal x is interpreted as a message transfer request, then on receiving signal x, Y will cause messages from X, kept in certain designated memories, to be delivered to their intended recipients. These causal statements have well-defined semantics in the context of message and signal transfer protocols used in TICC, which make it possible to compile statements of this form to code that is necessary to take messages from X and deliver them to their intended recipients. Application objects will use causal statements like this to initiate asynchronous message transfers.

In a multiprocessing system, consisting of (N+1) processors, [C, X.sub.1, X.sub.2, . . . , X.sub.N], N>=1, C will be the dedicated TICC-communications processor and the rest will be used to execute code associated with application objects. Code generated from API statements in CCL (i.e. code pertaining to creation, deployment and modification of networks of TICC-pathways) will be executed by the same processors, X.sub.1, that execute code associated with application objects. These API statements will belong to a TICC subsystem, called Communication System Manager (CSM). CSM is responsible for maintaining the TICC name space, establishing, and removing TICC-pathways and their components in a manner that is consistent with application security. Every processor X.sub.1 will have one or more instances of CSM. Every application object O will have a CSM instance associated with it. O will send requests to its designated CSM to establish new TICC communication pathways or remove and modify existing pathways. A CSM instance may serve more than one application object. Thus, for an object, O, if X were the processor that executed its code, then X would also execute all CSM related tasks for that object.

Codes generated by TICC-compiler for the CCL causal statements in an application object, O, would also have two parts: The first part would be executed by the same processor, X that executes code in object O. This would pertain to (a) setting up protocol sequences, called segments (because they correspond to TICC-pathway segments), that are necessary to realize the requested message transfers, and (b) presenting the segments to the dedicated TICC-communications processor, C. The second part is the collection of segments generated by the first part. The communications processor C will execute this collection of protocol segments, in parallel with application programs and cause messages to be delivered asynchronously to their intended recipients. Such message delivery in TICC would be self-synchronizing, loss less and buffer free.

1.2.Deployment of TICC

TICC communication processor may be implemented in software or in hardware. If it is implemented in software then it may be made available to application programmers through an Application Compiling Interface (ACI). ACI will contain a TICC parser and compiler that compiles TICC-causal communication statements, and it will also contain an Application Programmer Interface (API). Source code for objects in applications may be written in any language, say language L, for which a compiler is available. Statements in language L will appear in source code intermixed with statements in CCL. Application programs containing statements in language L mixed with CCL would be sent through a preprocessor that separates out these two types of statements, sending TICC-language statements to the TICC-compiler and computation statements in language L to their own associated compiler. The CCL compiler will put in place code for CCL statements at right places, in sequence with object code generated for statements in programming language L.

TICC technology for integrated communications in parallel with computations in multi-processor systems is so totally new that, to the best of our knowledge, there is nothing in prior literature that comes close to it except for one. It is a patent by B. Gopinath. [B. Gopinath and David Kurshan, "Composition of Systems of Objects by introducing coordination, projection and distribution," U.S. Pat. No. 5,640,546, Filed Feb. 17, 1995]. This patent does not however use the concept of causal communication language and the concept of executing communication protocols in parallel with on going computations. A paper in published literature [von Eicken, et al 1992]introduces the concept of "Integrated communication and computation." Its focus is on communications over local area nets (LAN). It does not address issues of communication among parallel processes in multiprocessing systems through shared memory, nor does it propose execution of communication protocols in parallel with computations or define a causal communication language to integrate computation and communication.

1.3. Agents and Ports in TICC Pathways

TICC pathways are identical to RESTCLK pathways. However, the way components in TICC pathway function are different from the way components function in RESTCLK. Whereas RESTCLK uses three kinds of control signals, TICC uses four. The control signals exchange protocols used in TICC are different from the ones used in RESTCLK. RESTCLK executed its signal exchange protocols non-deterministically in series with application processes using interrupt control. TICC executes its signal exchange protocols deterministically in parallel with application processes. These differences make it possible for TICC to provide guaranteed loss less, low latency data transfers among concurrent/parallel processes, none of which RESTCLK was able to do.

Each software object in a TICC based application system will contain TICC ports. Objects may send and receive messages only by using their ports. An object may have any number of ports. Application objects may not communicate with each other by setting global variables. All TICC-pathways will begin and end in ports and will contain TICC agents. A port and an agent are said to be tuned to each other (listening to each other) when they share certain control signal variables. An agent may be simultaneously tuned to several ports belonging to distinct objects. Each port will be tuned to one unique agent. No agent may be tuned to two ports of the same object. Always, the agent tuned to a port will be responsible to deliver messages to the port and dispatch messages out of the port via already established TICC pathways.

Ports and agents in TICC are very simple sequential machines. A tuned (ports, agent) pair will exchange control signals between them through shared control variables. These control signal exchanges will always happen in synchronous mode, i.e., every time a port or agent receives a control signal, it will be waiting for it and take appropriate action, in parallel with computations, as soon as it senses the signal. In every state, each port and agent would expect to receive only certain a priori specified control signals and wait for them. Every port and agent would always receive only the control signals it was waiting for. Exchange of control signals between tuned ports and agents will establish the context for data transfers to occur between objects connected by TICC-pathways. Always only control signals will travel over TICC-pathways.

Data transfers will occur through shared memory in all local communications (i.e., between objects in the same multiprocessor). In the efficient software implementation proposed here, once a port or agent is activated, a control signal exchange will take about 10 to 15 CPU cycles (about 5 nanoseconds in a 3-gigahertz machine). TICC-pathways will consume very little resources. They may be dynamically installed and removed within about 150 to 450 CPU cycles (50 to 150 nanoseconds in a
3-gigahertz machine). Once a pathway is established it would stay unless explicitly removed. In more than 75% of cases, needed pathways would already exist.

1.4.Group-to-Group Data Exchange in TICC 1.4.1. TICC Pathway Components and Structure

Basic mode of asynchronous data exchange in TICC is from one group of software objects, say grouping G1, to another grouping G2. To coordinate group-to-group (GTG) communications and transaction, one agent will be tuned to ports belonging to distinct objects in grouping G1 at the rate of one port per object and another will be tuned to ports of distinct objects in grouping G2, again at the rate of one port per object. Any two groups of objects, G1 and G2 that communicate with each other should be non-intersecting, i.e. no object may appear in both groups. Membership of objects in groupings like G1 and G2 may be changed dynamically. Point-to-Point (PTP) data exchange is a special case of this general GTG exchange.

FIG. 1 illustrates the structure of a TICC-pathway that connects two such groups. We will use this figure to identify and introduce the various components of a TICC-pathway and explain the functions performed by them.

In FIG. 1 agent, A1 is tuned to ports P.sub.3 and R.sub.3 of objects O.sub.1 and O.sub.2 in grouping, G1, via watch-rings w.sub.1 and w.sub.2, respectively. Similarly, agent A2 is tuned to ports Q.sub.3 and T.sub.3 of objects O.sub.3 and O.sub.4
in grouping, G2, via watch-rings w.sub.3 and w.sub.4, respectively. Watch-rings are connected to an agent at its composite agent port. Thus, watch-rings w.sub.1 and w.sub.2 in FIG. 1 are connected to agent A1 at its composite agent port. Similarly, watch-rings w.sub.3 and w.sub.4 are connected to agent A2 at its composite port. Always agents will be tuned to ports via watch-rings connected in this manner. Watch-rings transport control signals that are exchanged between an agent and ports that are tuned to it. As shown in FIG. 1, an agent sends start control signal, s, via delivery sides of watch-rings, to ports that are tuned to it. When ports sense the start signals sent by agents tuned to them, they will inform their parent objects that a message is waiting for them. Ports will receive completion signals, c, from their parent objects when they complete their computations. Ports will forward these completion signals, c, to agents who are tuned to them, via dispatch sides of watch-rings that connect them to the agents. This will inform the agents that parent objects of ports that are tuned to them, had completed their computations. The completion signal, c, can take one of three possible value, t, d, or h, as shown in FIG. 1. Every watch-ring in FIG. 1 is shown with names of control signals that may travel along its delivery and dispatch sides.

Agents A1 and A2 in FIG. 1 are connected to each other by clock-ring, C. Agents are attached to the clock-ring at their clock-ports. In FIG. 1, the clock-ring has only two agents. In general, a clock-ring may have several agents attached to it, as shown in FIG. 4. Clock-ring carries start control signal, s that is sent by each agent on the ring to its next agent, in clockwise direction, in order to transfer control to the next agent.

Clock-ring C has three memories associated with it: a scratchpad memory, SP, a read-memory, R, and a write-memory, W. When an object is informed by its port P that a message is waiting for it, it will read its input message via port P from the read-memory of the clock-ring to which the agent that is tuned to port P is attached. Before an object completes its computation, it may write output messages in the write-memory of the same clock-ring using the same port P. Objects in FIG. 1 have each eight ports. An object may in general have an arbitrary number of ports. 1.4.2. Computation and Communication around a Clock-ring

Initially all agents and ports in FIG. 1 would be In their quiescent state (also called idle state). Computation in objects around the clock-ring will begin only when one of the agents sends a start signal, s, to ports that are tuned to it. Since all agents are idle initially, the CSM will be used to start computations around clock-ring C, by injecting a start signal, s, into a designated agent on the clock-ring. (We will later see how this is done.) Let us suppose that A1 is the designated agent of clock-ring C and the CSM injected a start signal into agent A1. As soon as agent A1 senses this start signal s, it will go to its busy state and forward the start signal s to all the ports that are tuned to it, via the delivery sides of watch-rings that are connected to it. Thus, in FIG. 1 A1 would forward the start signal s to ports P.sub.3 and R.sub.3.via delivery sides of watch-rings w.sub.1 and w.sub.2. When ports P.sub.3 and R.sub.3 sense this start signal sent to them by agent A1, they would put themselves in the busy state and forward the start signal s to their respective objects, O.sub.1 and O.sub.2.

When objects O.sub.1 and O.sub.2 sense this start signal sent to them by their ports, and when they are ready to do so, they would read their input data from the read-memory of clock-ring C using their respective ports P.sub.3 and R.sub.3, and begin their computations. When they had completed their computations, or during their computations, they would write their output data into the write-memory of clock-ring C. When each object had completed all of its computations, it would send a completion signal, c, to the port through which it got its input data. Thus, in FIG. 1 object O.sub.1 would send completion signal c to port P.sub.3, when it had completed all of its computations. Similarly, object O.sub.2 would send completion signal c to port R.sub.3, when it had completed all of its computations. There are three kinds of completion signals, t, d, or h. The completion signal sent by an object to its port can be only one of these three. Thus, c would be equal to either t, or d or h. We will soon see the significance of these three different kinds of completion signals.

When a port senses the completion signal c sent to it by its parent object, it would forward the signal to the agent it is tuned to via the dispatch side of watch-ring that connects it to the agent, and then turn itself off, i.e., put itself in its idle state. Thus, in FIG. 1 port P.sub.3 would forward the completion signal c, it received from object O.sub.1, to agent A1 via the dispatch side of watch-ring w.sub.1, and turn itself off. So also, port R.sub.3 would forward the completion signal c, it received from object O.sub.2, to agent A1, via the dispatch side of watch-ring w.sub.2, and turn itself off. It may be noted, objects O.sub.1 and O.sub.2 may not complete their respective computations at the same time. Thus, A1 would receive completion signals from ports P.sub.3 and R.sub.3 at different times.

After agent A1 had sensed completion signals from all the ports it is tuned to, it would check whether a certain agreement protocol had been satisfied, i.e., whether all ports tuned to it had sent appropriate completion signals. This agreement protocol is a Boolean function of control signals exchanged between agent A1 and all the ports that are tuned to it (defined in Section 1.10 of this chapter). Thus in FIG. 1, agent A1 would check for satisfaction of agreement protocol after it had received completion signals from both ports P.sub.3 and R.sub.3. During all this time, data in the read-memory R of clock-ring C will remain undisturbed.

If agreement protocol is satisfied, then agent will do one of the following, depending on the kind of completion signals it received: a If the completion signal it received from at least one of the ports is t then agent will switch the memory designation of clock-ring C, i.e., what was previously the read-memory will now be designated as the write-memory, and what was previously the write-memory will now be designated as read-memory. After doing this, the agent will send a start signal s to its next agent on the clock-ring and turn itself off (i.e., put itself in its idle state). Thus, in FIG. 1 agent A1 would re-designate the read-memory R as the write-memory, and the write-memory W as the read-memory. After switching the memory designation in this manner, A1 would send a start signal s to its next agent, A2, and turn itself off (i.e. put itself into the idle state). If the completion signals an agent received from all the ports it is tuned to is d, then agent will not switch the memory designation of clock-ring C, but will simply send a start signal s to its next agent on its clock-ring and turn itself off. Thus in FIG. 1 if both ports P.sub.3 and R.sub.3 sent completion signal d to agent A1, then A1 would not switch memory designation of memories in dock-ring C, but would simply send start signal s to its next agent A2 and turn itself off. In this case, the agent would be forwarding to its next agent the same inputs, that ports tuned to the agent sent to their parent objects in their just completed computations. If the completion signals an agent received from ports tuned to it is, h, then agent will simply turn itself off. It will not send any start signal to anybody. This will halt computations around the clock-ring C.

Let us suppose that computations around clock-ring C in FIG. 1 did not halt and agent A1 did switch the read and write memories of clock-ring C. This is the normal mode of operation. Then A1 would have sent a start signal to agent A2 and turned itself off. When A2 sensed this start signal s, it would turn itself on, i.e., put itself in its busy state, and forward the start signal s to all ports that were tuned to it. Thus in FIG. 1, A2 would send start signals to ports Q.sub.3 and T.sub.3 of objects O.sub.3 and O.sub.4. Because of re-designation of read/write memories, memory R in FIG. 1 would now be the write-memory for objects O.sub.3 and O.sub.4, and memory W would be their read-memory. Thus, these objects would now be able to read messages written into memory W by objects O.sub.1 and O.sub.2.

At this point the same cycle of events that happened with ports P.sub.3, R.sub.3, and objects O.sub.1 and O.sub.2, when they recieved the start signal from agent A1, would now happen with ports Q.sub.3, T.sub.3, and objects O.sub.3 and O.sub.4. At the end of their computations, when A2 sensed satisfaction of agreement protocol in the completion signals it recieved from ports Q.sub.3 and T.sub.3, it would take appropriate action, and might forward replies from objects O.sub.3 and O.sub.4 back to objects O.sub.1 and O.sub.2. This might then start the next cycle of computations around the clock-ring C. Thus, at any given time only one group of objects around the clock-ring will be active, unless a pipeline implementation is used [see S. Das 1999, for details]. Each group around a clock-ring would receive new input data only after it had fully absorbed its current inputs and completed all of its computations (this will be true also for pipeline implementations). This is just like what happens in asynchronous hardware circuits. Thus, objects O.sub.1, O.sub.2, O.sub.3, and O.sub.4 in FIG. 4 would not need any input buffers. Since every port of every application object will be similarly connected to a clock-ring through an agent, none of them would need input buffers.

There is implicit transaction control with reply semantics for every GTG (group-to-group) transaction in TICC. The agreement protocol feature makes it possible to easily implement two-phase commit in all transactions.

1.4.3. Execution of Control Signal Exchange Protocols by Processor C

1.4.3.1. Segments and Pending Lists

Let us begin when objects and agents around clock-ring in FIG. 1 are in the following state: Objects O.sub.1 and O.sub.2 are busy doing their computations on inputs read from read-memory R of dock-ring C, and agent A1 and ports P.sub.3 and R.sub.3 are all in their busy state. Ports P.sub.3 and R.sub.3 would be expecting to receive completion signal c from their parent objects and agent A1 would be expecting to receive completion signals from ports P.sub.3 and R.sub.3.

Every port and agent will have a completion signal processing function associated with it. For a port, P, let P-Cfn, be this function and for an agent, A, let A-Cfn-P be the function A uses to process completion signals received from port P. Similarly, every port and agent will also have a start signal processing function. Let P-Sfn and A-Sfn be the start signal processing functions of port P and agent A. We will refer to functions like these as protocol functions. These functions implement the protocols for receiving and responding to start and completion signals. Thus for port P.sub.3 and agent A1 their, respective, completion signal processing protocol functions would be P3-Cfn and A1-Cfn-P3. For port, R.sub.3 and A1 their respective completion signal processing protocol functions would be R3-Cfn and A1-Cfn-R3. For agents A1 and A2 their start signal protocol functions would be, respectively, A1-Sfn and A2-Sfn. Similarly, the start signal protocol functions for ports P.sub.3, R.sub.3, Q.sub.3, and T.sub.3 would be, respectively, P.sub.3-Sfn, R3-Sfn, Q3-Sfn, and T3-Sfn.

When object O.sub.1 sends a completion signal c to its port P.sub.3 the sequence of protocol functions,

The SEGMENT-DRIVER executes pending lists cyclically starting with X.sub.1-pending-lst and returning to X1-pending-lst after it had executed X.sub.N-pending-lst. In each pending-lst, it will execute every segment in the order they are presented in the pending-lst. Segments will be removed from the pending-lst after execution. After emptying one pending-lst, SEGMENT-DRIVER will proceed to execute segments in the next pending-lst. However, when a pending-lst becomes empty it should somehow be replenished, in order for the process to continue for as long as the application system itself is working. This is done as follows. 1.4.3.2. Dynamic Updating of Pending Lists

As new segments are constructed, each processor X.sub.1 will not append the new segment directly to Xi-pending-lst, since this may interfere with execution of segments in the pending list by the SEGMENT-DRIVER. Instead, X.sub.1 will append new segments to another list, called X.sub.1-pl-upd-lst, which is the pending list update list of X.sub.1. When SEGMENT-DRIVER empties X.sub.1-pending-lst it would cause X.sub.1-pending-lst to be replenished with segments, which might have since accumulated in the X.sub.1-pl-upd-lst. This replenishment is done by a function called UPD-PENDING-LST. Operation of this function is controlled by a semaphore, called X.sub.1-semaphore. UPD-PENDING-LST will replenish X.sub.1-pending-lst with segments from X.sub.1-pl-upd-lst only if X.sub.1-semaphore is NIL. After X.sub.1-pending-lst had been replenished, this semaphore would be set to T. Every time processor X.sub.1 seeks to append a new segment to X.sub.1-pl-upd-lst, it will check whether the X-semaphore is T. If it were T then it would destroy the existing X.sub.1-pl-update list, install a new one, and append the new segment to this newly created X.sub.1-pl-upd-lst. After doing this X.sub.1 will reset the X.sub.1-semaphore, back to NIL.

By adopting this kind of producer/consumer paradigm, mutual interference between the parallel processors X.sub.1 and C is avoided for every X.sub.1. Thus while the segment driver is executing the segments in the various pending lists, each processor X.sub.1 may append new segments to X.sub.1-pl-upd-lst without interfering with the segment driver. It may be noted, while UPD-PENDING-LST is updating the X.sub.1-pending-lst and the semaphore is still NIL, it is quite possible for processor X.sub.1 to append new segments to X.sub.1-pl-upd-lst without interfering with the operation of UPD-PENDING-LST. 1.4.3.3. Segment Execution and Agreement Protocol Satisfaction

Now let us get back to what happened to segments S.sub.1 and S.sub.2 in equations (2) and (3). The diagram at the bottom of the figure in FIG. 2 shows the pending list associated with processor X that executed code in objects O.sub.1 and O.sub.2
shown in FIG. 1. We have assumed that the pending list S.sub.1, of equation (2) above, was presented to C first, and after some time the pending list S.sub.2, of equation (3) above, was presented. It might have happened this way because, in FIG. 1, object O.sub.1 might have sent its completion signal to its port P.sub.3 before object O.sub.2 sent its completion signal to port R.sub.3.

Processor C will be able to execute the protocol functions of segments S.sub.1 and S.sub.2 only after it had executed all the segments that were ahead of it in its input queue. Let us assume that in FIG. 2 there were Q segments ahead of segment S.sub.2 and (Q-k) segments ahead of segment S.sub.1. After executing a protocol function in a segment, the SEGMENT-DRIVER will execute the next protocol function only if the just executed function returned T. Thus, in segment S.sub.1 of equation (2), which is reproduced below for convenience, S.sub.1=(P3-Cfn, A1-Ctn-P.sub.3, A2-Sfn, Q.sub.3-Sfn, T3-Sfn) (2) when P3-Cfn is executed, it would return T, because at that time P.sub.3 was expecting to receive a completion signal from its parent object. However, when the next protocol function, A1-Cfn-P3 is executed, it would return NIL, because agreement protocol would not have been satisfied yet. Thus, segment S.sub.1 execution would halt after A1-Cfn-P3. The remaining protocol functions in segment S.sub.1 would not be executed. Thus, no message would be delivered to ports Q.sub.3 and T.sub.3.

However, execution of A1-Cfn-P3 in S.sub.1, would cause agent A1 to update the status of its interactions with its ports. Thus, it would remember the fact that a completion signal had been received from port P.sub.3 and remember what completion signal it was. Thus, when the function A1-Cfn-R.sub.3 is later executed in segment S.sub.2, S.sub.2=(R.sub.3-Cfn, A1-Cfn-R3, A2-Sfn, Q3-Sfn, T3-Sfn) (3) agreement protocol would be satisfied. This is because, agent A1 would have received by this time completion signals from all ports tuned to it. Thus, the start protocol functions (A2-Sfn, Q3-Sfn, T3-Sfn) in S.sub.2 would all be now executed. This would result in objects O.sub.3 and O.sub.4 being notified by ports Q.sub.3 and T.sub.3, respectively, that a message is waiting for them in their read-memory in clock-ring C. After this notification, objects O.sub.3 and O.sub.4 could read their input messages at any time they chose. Message would remain undisturbed in the read-memory until it had been fully read and absorbed. This constitutes message delivery in TICC.

1.4.4. Asynchronous Message Delivery is Loss Less and Self-synchronizing

Message delivery here is loss less because all of the following processes that were involved in the steps needed to deliver the message were deterministic processes: Constructing and appending segments S.sub.1 and S.sub.2 to X.sub.1-pl-upd-lst, Transferring contents of X.sub.1-pl-upd-lst to X.sub.1-pending-lst, and Execution of pending segments in X.sub.1-pending-lst. These operations were triggered by definite events that had to occur in every computational sequence, every time a message transfer request is made by an object. Unless the programs that perform these operations themselves failed, all the tasks enumerated above would be performed without fail. Since each pending segment is removed from pending-lst after its execution, each message would be delivered only once. Message exchange is asynchronous because any object could send a message to any other object at any time, if a TICC pathway existed between them. If one did not exist, then a new pathway connecting them could be dynamically established, if application security permitted it. These asynchronous message deliveries are self-synchronizing because agents (ports) do not have to synchronize with ports (agents) in order to pass start (completion) signals to them, and ports do not have to synchronize with their parent objects to send them start signals. So also, objects do not have to synchronize with their ports in order to send completion signals to them.

Observation 1: Thus, we will have loss less self-synchronizing asynchronous message transfers as long as TICC-pathways are not changed before pending segments are executed.

Suppose t.sub.c was the time when all ports tuned to an agent had sent completion signals to the agent. This is the time when agreement protocol would be satisfied and messages would be ready to be sent to their destinations. Let t.sub.d be the time when the messages were actually delivered to destination ports. Then (t.sub.d-t.sub.c) would be the communication latency, L. This can be computed as discussed in the next subsection.

1.5. Communication Latency.

Assume that on the average there were at most n ports tuned to each agent. As shown in FIG. 2, assume that there were Q pending messages awaiting delivery ahead of segment S.sub.2. As mentioned earlier, execution of segments S.sub.1 and S.sub.2
would cause satisfaction of agreement protocol. In FIG. 1, n=2, and thus there were two segments in the pending list and each segment S.sub.1 and S.sub.2 had five protocol functions. Among the five protocol functions in S.sub.1, only two were executed because, in this case, agreement protocol was not satisfied. However, in S.sub.2 all the five protocol functions were executed. In general, for an arbitrary n each segment would contain (n+3) protocol functions and the pending list will contain in all n segments, one corresponding to each port. In (n-1) of these n segments agreement protocol would not be satisfied and thus in each of these (n-1) segments only two protocol functions would be executed. Message would be delivered only when all the n segments were executed. This would thus take in all, 2 (n-1)+(n+3)=(3n+1), protocol function executions. Assume that on the average t CPU cycles were needed to execute one protocol function. Then it would take t (3n+1) CPU cycles to execute all protocol functions in all the n segments. Thus, each message delivery will take at least t (3n+1) CPU cycles to get delivered. If we had a F gigahertz CPU then each CPU cycle would take (1/F) nanoseconds. Or, in one microsecond [1000F/t(3n+1)] messages could be delivered.

To this we have to add the time it took to construct the first segment and install it in X-pending-lst. On the average, this will take about the same number of CPU cycles as it takes to execute one complete segment, namely t (n+3) CPU cycles. Thus, the total latency per message will be [t(3n+1)+t(n+3)] CPU cycles. Let Q.sub.max and Q.sub.ave be, respectively, the maximum and average lengths of queue in front of the segment driver. We then get the following expressions for maximum, average, and minimum latencies, L.sub.max, L.sub.ave and L.sub.min: L.sub.max.sub.i=[(t(3n+1)Q.sub.max+t(n+3)]/F) nanoseconds (4a) L.sub.ave=([t(3n+1)Q.sub.ave+t(n+3)]/F) nanoseconds, and (4b) L.sub.min=[t(3n+1)+t(n+3)]/F nanoseconds. (4c) all in (ns) nanoseconds. It is instructive to get an idea of numbers one gets for typical values of n, t and F. Typical value for n would be between 1 and 2. Let us assume that n=2, and F=3 (i.e., we have a 3-gigahertz machine). Typical value for t is 15 CPU cycles. Thus, it would take about 5 nanoseconds to execute a protocol function. Then minimum latency for the first message delivery is 60 nanoseconds in other words, there could be at least 16 full message deliveries per microsecond. Actually there could be more, since the time taken to present a segment to C is pertinent only for the first segment presented to C. The processor C would be busy executing segments already presented to it, while the later segments were being appended to the various pl-upd-lsts. That is why, in equations (4a) and (4b) above, the time, t(n+3)/F, needed to present a segment to processor C is added to the latency expression only once. Let us now calculate L.sub.ave based on expected value of number of messages in queue in front of the SEGMENT-DRIVER.

Let M.sub.ave be the expected value of total number messages that are generated every microsecond. M.sub.ave will be the sum of expected values of messages generated per microsecond by each processor. Let us assume that the expected value of number of messages generated per microsecond by processor X.sub.1, is m.sub.t messages/microsecond. Then M.sub.ave=[m.sub.1+m.sub.2+ . . . +m.sub.N] (5) and the expected value of queue length Q.sub.ave will be, Q.sub.ave=M.sub.ave-(1000F/t(3n+1)) (6) since processor C would execute (1000F/t(3n+1)) messages in that microsecond. Let us assume that the average value of m.sub.t is 1, i.e., on the average, each processor would generate one message in every microsecond (which is rather a very high rate of message generation). Let us assume, N=100, i.e., there were 100 processors in the multi-processor, g=3 gigahertz, t=15 CPU cycles, and on the average each agent had two ports tuned to it, i.e. n=2. Then M.sub.ave=100 and from equation (6) we get Q.sub.ave=15. For this value of Q.sub.ave we get from equation (4b) L.sub.ave=550 nanoseconds. (7) During this 550 nanoseconds about 15 group to group messages would have been delivered. If we assume that the maximum rate of message generation per processor is 2 message/microsecond, then the maximum possible latency for 100 processors, would be, L.sub.max=4.05 microseconds (8) The important point to notice here is that during the 4.05 microseconds about 115 group-to-group messages would have been delivered. In actual practice, the message generation rate will be much less than the rates we have assumed and thus most of the time latency will be no more than 100 nanoseconds (1.7 times the minimum latency).

This calculation did not account for time taken to enforce application system security at every port of an application system. As may be seen from the design for implementation described in Chapter 2, dynamic security enforcement would consume on the average about as much time as the time needed to execute two protocol functions for each port (i.e. about 30 memory cycles). For n ports this would amount to 2nt CPU cycles per message. Thus, with dynamic security enforcement the various latencies would be of the order of, L.sub.max.sub.i=[t(5n+1)Q.sub.max+t(n+3)]/F nanoseconds, (9a) L.sub.ave=[t(5n+1)Q.sub.ave+t(n+3)]/F nanoseconds, and (9b) L.sub.min=[t(5n+1)+t(n+3)]/F nanoseconds, (9c) For typical values, N=100, n=2, t=15, and F=3
assumed above, we have L.sub.max=6.5 ms L.sub.ave=850 ns, and L.sub.min=80 ns. (10) where ms stands for microseconds and ns stands for nanoseconds.

These are latencies for group-to-group message transfers with n=2, and with dynamic security enforcement. On the average, about 18 messages would be delivered every microsecond. This compares with 5000 to 10000 CPU cycles latency it might take in current technologies for each point-to-point (not group-to-group) message transfer, without security enforcement. For n=2, with a 3-gigahertz processor, it would take a minimum of 3.33 to 6.66 microseconds per message delivery in the best current technology, compared with 80 nanoseconds (equation 10) in TICC (and this is with security enforcement). For delivering 19.6 messages it would take a minimum of 65.27 to 130.54 microseconds, with in a multiprocessor, where messages are exchanged through shared memory, compared with 6.5 microseconds (equation 10) in TICC

If there are 1000 processors, then by using 10 communication processors one can maintain the same latency rates in TICC. TICC is scalable to any number of processors. But, this is not possible in other technologies. The advantages of using TICC are quite dramatic. We believe, group-to-group message transfers with latency rates estimated above for TICC are simply impossible in any other current technology.

1.6. Dynamic Flexibility 1.6.1. An Overview

TICC allows pathways in a TICC-network to be dynamically modified without service interruption in application systems. All of the following may be dynamically performed (by "dynamic" we mean that changes may be introduced while an application system is running without service interruption and without loss of benefits gained from computations performed before changes were completed; the only cost will be a certain delay): 1) New pathways may be created and established, existing pathways may be removed or modified, 2) Monitoring networks to monitor messages flowing through any point in a network of TICC-pathways (but not inside application objects) may be installed and removed. This feature may be used to dynamically debug parallel computations. 3) Computations performed by an application object may be temporarily suspended and resumed as and when needed. 4) Application systems may be dynamically reconfigured without service interruption. 5) Application system security may be dynamically enforced on each object at every one of its ports for every message that is sent or received by the port. Such security enforcement systems may themselves by dynamically monitored. 6) Performance of application objects at every one of their ports may be dynamically monitored, incorporating methods to anticipate impending failures and take corrective measures, at little cost in time and resources. 7) New versions of objects may be dynamically tested in exactly the same context in which their older versions function, in parallel with older versions.

These are the principal features and advantages of dynamic flexibility in TICC based systems. All of the above may be realized in TICC without in any way planning for them, at the time of design and implementation of application-systems. Thus, one may respond to unexpected system failures by dynamically installing debugging networks and reconfiguring an application system as needed. One does not have to anticipate possible system failures and provide additional code in application objects to respond to them. Dynamic debugging of parallel processes, and dynamic reconfiguration without advance planning or service interruption, are novel features that, to the best of our knowledge, only TICC technology can make available. With all of these benefits, communication latencies in TICC would still be about one tenth or one hundredth of latencies in other technologies.

We begin our discussion below outlining the technical details that make all of this possible. After this, we will introduce TICC network structures, their operation, and show how one may monitor data, flowing through TICC-pathways and obtain the dynamic flexibility claimed above.

1.6.2. Pending Flags and Protection of Protocol Function Execution

Every protocol function in TICC will be associated with a flag called pending flag. For a protocol function, f, its associated pending flag will have the name f-pending. For the protocol functions associated with agents and ports in FIG. 1 we will have the following pending flags: A1-sfn-pending, A1-Cfn-P3-pending, A1-Ctn-R3-pending A2-Sfn-pending, A2-Cfn-Q3-pending, A2-cfn-T3-pending, P.sub.3-Sfn-pending, R.sub.3-Sfn-pending, Q.sub.3Sfn-pending, T.sub.3-Sfn-pending, P.sub.3-Cfn-pending, R.sub.3-Cfn-pending, Q.sub.3Cfn-pending and T3-Sfn-pending. Every time a protocol function is put in a segment that is appended to a pl-upd-lst, its pending flag would be set to T. Thus, when segments S.sub.1 and S.sub.2 are presented to C all the pending flags of protocol functions in the segments would have been set to T. These flags would be reset to NIL only after the protocol functions have been executed by the SEGMENT-DRIVER.

CSM would examine pending flags of protocol functions associated with a pathway before it begins to modify the pathway, in order to determine whether the pathway could be safely modified without disturbing any of the pending message transfers. If the pending flag associated with any of the protocol functions of a pathway is T, it would indicate that one or more of the pending message transfers might be disrupted, if those pathways were changed. Therefore, CSM would wait until pending flags became NIL, before changing those TICC-pathways. After they become NIL, the CSM may lock pertinent agents on the pathways so that they would not initiate new message transfer requests while the pathways were being changed. When all needed changes had been done, then CSM would release all locked agents and at that point normal operations would resume. Details on incorporating pending flags and using them correctly are presented in the TICC design described in LISP in Chapter 2.

By making use of pending flags CSM would guarantee that none of the pending message transfers in an application, would be interrupted or lost by updates that might have to be done dynamically on networks of TICC-pathways. Observation 2: Thus, we will have loss less message transfers even if a need to modify a TICC-pathway arose before pending segments were executed.

1.7. Structural and Semantic Restrictions on TICC-pathways

All TICC pathways have to obey the structural and semantic restrictions presented in this subsection. Many of the restrictions have been already mentioned in text. It is useful to first define what a TICC-pathway is. We have already seen examples of such pathways. This is given below (words in italics refer to technical terms used to refer to TICC-pathway components):

Each TICC-pathway specifies a connection between ports belonging to distinct application objects in one group, say group G1, to ports belonging to distinct application objects in another group, say group G2, where G1 and G2 are non-intersecting, i.e., the same object does not appear in both groups. There is no restriction on the number of objects that each of these two groups may contain.

Each selected port of group G1 will have one end of a watch-ring attached to it. The other end of this watch-ring will be connected to the composite agent port of an agent, say agent A1, as shown in FIG. 1. Thus, if the group had n selected ports then agent A1 will have n watch-rings connected to its composite agent port. Agent A1 is said to be tuned to all the selected ports of group G1 when such watch-ring connections are made between the agent and the selected ports. The clock-port of this agent A1 will be attached to a clock-ring, C. With this configuration, all selected ports of group G1 would be tuned to the same agent A1, all objects in group G1 will have access rights to memories of the clock-ring.

Similarly, selected ports of group G2 will all be tuned to another agent, say agent A2, by establishing watch-ring connections between the composite agent port of A2 and the selected ports in G2. The clock-port of agent A2 will also be attached to the same clock-ring C. All objects in group G2 will also have access rights to memories of clock-ring.

Each clock-ring will contain three kinds of memories: Read-memory, write-memory, and a scratchpad memory. Ports tuned to an agent, that is attached to a clock-ring C, will read input messages from the read-memory of clock-ring C and deliver them to their parent objects. They will write output messages into the write-memory of C. Parent objects of ports that are tuned to an agent attached to C may use the scratchpad memory of C to exchange data among themselves to coordinate their activities, while they are performing their respective computations, on inputs received from the read-memory of the clock-ring. If an object had several ports, it might simultaneously enjoy access rights to memories of several distinct clock-rings, one for each port of the object.

The total configuration, thus obtained consisting of ports, watch-rings, agents, and clock-ring is a TICC-pathway that connects objects in, group G1, to objects in, group G2. Each such TICC-pathway containing a clock-ring with only two agents is bi-directional, i.e., each group connected to the pathway may send messages to the other group.

A clock-ring may have more than two agents attached to it. Thus, as shown in FIG. 4 a clock-ring may have several groups of objects, whose selected ports are tuned to agents attached to the clock-ring. If a clock-ring has m agents attached to it, with each agent tuned to a group of selected object ports, then the clock-ring will contain m TICC-pathways. Each such pathway will connect one group of objects around the clock-ring to its next group on the clock-ring, in clockwise direction. Thus, the clock-ring in FIG. 4 contains six TICC-pathways. In such cases, each group around the clock-ring may send messages only to its next group on the clock-ring in the clockwise direction. No group on a clock-ring can directly send messages back to its previous group. Messages and control signals may not travel in anti-clockwise direction around a clock-ring.

Every agent will send a start control signal, s, to its next agent on a clock-ring, in clockwise direction, when it had completed its task and is ready to activate its next agent. The next agent will always be waiting to receive such activation. Each agent will send start control signal, s, to ports that are tuned to it via the delivery sides of watch-rings attached to its composite agent port. Finally, each agent will receive from each port that is tuned to it, a completion signal, c, via dispatch-side of watch-ring, when parent objects of the ports complete their, respective, computations. In all cases only control signals will travel along TICC-pathways. They will establish the context for message transfers to occur through shared memories of the clock-ring.

In the design considered here only one group of objects around a clock-ring may be active performing computations on inputs read from the read-memory of the clocking. A pipelined implementation, in which several agents around a clock-ring could be simultaneously active, is also possible.

There are structural and semantic restrictions on ports, which may be connected by TICC-pathways. Many of them have been already mentioned in the text. They are summarized below.

Structural Restrictions on TICC-Pathways

Rule 1: Every agent will have a unique parent clock-ring to which it is attached.

Rule 2: A clock-ring may have several agents attached to it.

Rule 3: An agent may be simultaneous tuned to several ports, each belonging to a distinct object.

Rule 4: No two ports of an object may be tuned to any two agents on the same clock-ring.

Thus, no two ports of the same object may ever be connected by a TICC-pathway. These rules guarantee that network of TICC-pathways will be deadlock free.

Semantic Restriction on TICC-Pathways

Security Restriction: Two ports belonging to two different application objects may be connected by a TICC-pathway only if application security permits at least one of them to send messages to the other.

The following definitions are useful while talking about components of TICC pathways: An agent is free if it is not attached to any clock-ring, even if it is tuned to some ports. An agent is bound, if it is attached to a clock-ring even if it is not tuned to any port. Similarly, a port is free if it is not tuned to an agent, even if it is attached to a watch-ring. A port is bound if it is tuned to an agent, even if the agent is free. A clock-ring is bound if at least one of the agents attached to it is tuned to a port. A clock-ring is fee (or isolated) if none of the agents attached to it are tuned to ports. An agent not attached to a clock-ring and not tuned to any ports is an isolated agent. Similarly, an object with all its ports free is an isolated object.

New watch-rings may be created and installed to tune any free agent to a free port, if it does not violate any of the connectivity rules listed above and does not violate application system security.

1.8. Models of Computation in TICC

Once started, sequential computational activity around a clock-ring would keep migrating cyclically around the clock-ring from one group to its next, in clockwise direction, and back to the group where it all started. This sequential cyclical activity may be stopped and started when necessary by the objects themselves or by CSM. Thus, clock-rings model sequential processes at the object level. One may think of clock-rings as threads that sequence such processes at the objects level. FIG. 4
shows a typical sequential computation containing objects O.sub.1, O.sub.2 through O.sub.15 organized into six groups. Ports belonging to distinct objects in different groups are, respectively, tuned to agents A.sub.1 through A.sub.6 shown in FIG. 4. All of these agents are attached to clock-ring C, which contains memories R (read), W (write) and SP (scratchpad). At any given time, only one agent and objects (ports) tuned to it would be active. Normally, as each group completes its computation, computational activity would be transferred cyclically in clockwise direction around the clock ring. In pipelined implementation of TICC, more than one group of objects around a clock-ring may be active at the same time. Here also, computation will migrate around the clock-ring from one group to another in clockwise direction. Parallel computations in TICC would consist of several such sequential clock-rings running in parallel, and communicating with each other.

There are five pathway abstractions (components) in TICC. Four of them, namely agents, ports, watch-rings, and clock-rings have been already introduced. The fifth abstraction is the collator object shown in FIG. 5. The collator object in this figure connects four sequential computational rings, which may all run in parallel with each other in different processors of a multi-processor. The collator object in FIG. 5 will gather data from different clock-rings collate them in right combinations (which may be based on a time stamping scheme) and deliver them to one or more of the agents connected to it. Details of collator object implementation would depend on application system characteristics, method used to time-stamp data generated by objects and method used to specify the manner in which data should be collated. Once these applications dependent details are known it is not hard to implement the collection of collator objects necessary for the applications. Details of collator object design and implementation are not discussed in this patent application. Claims on this patent do not depend on details of collator object implementation.

It is assumed through out that, application objects will perform all of their input/output operations only through their ports and will not share information with each other through global variables.

There is a structural isomorphism between TICC-pathways in an application system, and the structure of computations (flow of computations) in the application at the objects level. This is quite similar to the structural isomorphism that holds in asynchronous (and synchronous) hardware systems between control (clock) signal distribution systems and hardware subsystems. In such hardware systems, each subsystem would receive its next input only after it had completed its computations on its current input. The same restriction holds for objects in TICC-networks. Thus, one may think of networks of TICC-pathways as the software equivalent of control (clock) signal distribution systems in hardware. No hardware system could be implemented independent of its control (clock) signal distribution system. Similarly, no software system should be implemented independent of its message distribution system. This is precisely what is being advocated and accomplished by TICC, the Technology for Integrated Computation and Communication.

1.9. Exchanging Control Signals

As mentioned earlier, only control signals will travel through TICC-pathways. They would establish the context for data transfers to occur via shared memories associated with clock-rings. An agent, A.sub.i, on a clock-ring may send a start-control signal, s, to its next agent A.sub.(i+1) on its clock-ring, after it senses satisfaction of agreement protocol. After sending such a control signal to A.sub.(i+1), agent A.sub.i would transfer itself to its idle-state. A.sub.(i+1) will go to its busy-state after it senses the start control signal, s, sent by A.sub.i. Only one agent can be in the busy-state on any clock-ring at any given time, unless a pipelined implementation is used. We will say an agent/port is busy when it is in its busy-state, and it is idle when it is in its idle-state.

When an agent becomes busy, i.e., just after it senses start signal s sent by its previous agent, it will send start signal s to all ports that are tuned to it. When a port senses the start signal sent to i by the agent tuned to it, it will become busy and inform its parent object that data is waiting for it in its read-memory. It will do this by sending start signal s to its parent object. Thus, for example in FIG. 1, when P.sub.3 senses the start signal sent by its agent A1, it will send a start signal s to its parent object O.sub.1. The object will become busy with its port P, when it senses the start signal s sent to it by port P.

As mentioned earlier, control signal exchanges occur through shared variables. Thus for a port, say port P, to send controls signal s to its parent object, say object X, both P and X should share a common control signal variable. Let P-s-X be this variable. In order to send signal sto X, the start protocol function of P, P-Sfn, would set the variable P-s-X to true. P would do this immediately after it senses the start signal sent to it by its agent. The start protocol function of P, P-Sfn, would sense the start signal sent by its agent, A, by noticing that the start signal variable A-s-P is true. All of this will happen, of course, only when P-Sfn is executed by the communications processor, C. All agents on a clock-ring, C, will share the same start control signal variable among them, in order to sent start signals to their respective next agents. For the clock-ring C, this start signal variable will be C-s-var. Thus, C-s-var will be the start control signal variable shared by agents A.sub.i and A.sub.(i+1). A.sub.i would set C-s-var to true in order send signal s to A.sub.(i+1). A.sub.i would do this only after it senses satisfaction of agreement protocol by objects that are tuned to it. The manner in which segments are constructed and presented to C would guarantee that only A.sub.(i+1) would sense that C-s-var had been set to true, and not any other agent on the clock-ring C. This method of passing start signals from one agent to another around a clock-ring, cannot be used if pipeline implementation is used. In that case, each agent A.sub.i should use a unique control signal variable, A.sub.i-s-A.sub.(i+1), to send a start signal to its next agent, A.sub.(i+1).

If all agents on a clock-ring are idle, then one may use the CSM to set C-s-var to true and immediately execute the start signal protocol function of the designated agent of the clock-ring C. if A.sub.i is the designated agent of clock-ring C, then the CSM would execute A.sub.i-Sfn immediately after setting C-s-var to true. This will cause A.sub.i to wake up (i.e., put itself in the busy state) and respond in the appropriate manner. This will start computations around the clock-ring C. In all cases, no coordination is necessary between control signal senders and receivers. Senders could set the relevant control signal variables that were accessible to them, at any time. Receivers would sense them when their protocol functions are executed by C.

A parent object X of port P would sense that P-s-X had become true, when it polls its port P. After sensing this start signal sent by its port P, X may read its input data from the read-memory of clock-ring it is tuned to via port P at any time it chooses. An object may poll its ports in a round robin fashion, or in any fashion controlled by threads that drive its methods. When X had completed writing all its output data into its write-memory, it would send a completion signal c, to P. It would do this by setting true the associated control variable, X-c-P that it shared with P. It may set this variable to one of three values t, d, or h. P will sense this completion signal sent by its parent X only when its completion signal protocol function, P-Ctn, is executed by C.

Elapsed times between setting and sensing of control signal variables would be a part of latencies encountered in control signal exchange processes. We have already seen how latencies in TICC may be calculated.

At this point one may note the following: No prior agreement (synchronization) is needed between a port and its parent object, in order for the port to send signal s to its parent, or for the parent to send a completion signal c back to the port. They may each set the relevant control signal variables true at any time they please without any coordination with the other. Similarly, no prior agreement is needed for control signal exchange to occur between a tuned (port, agent) and (agent, agent) pairs as well. Signal exchanges along TICC pathways will always occur synchronously, because each receiving agent/port would always be waiting for the signal it would receive, and would correctly respond to it as soon as it senses receipt of the signal. Observation 3: The synchronous signal exchanges that occur between pathway components in TICC would ultimately cause asynchronous self-synchronized data transfers to occur among objects in an application system. Since these signal exchanges occur in parallel with computations, objects do not have to waste time trying to synchronize with message recipients.

1.10. Agreement Protocol, AP

Completion signals are used in TICC to define four composite signals: One is the agreement protocol (AP), and the other three are composite-trigger, CT, composite-do-not-switch, CD, and composite-halt, CH, signals. These composite signals are defined as Boolean functions of completion signals received from ports tuned to an agent. The composite port of an agent (See FIG. 1) is responsible to compute these composite signals from completion signals it receives from ports tuned to them (hence the name).

If P.sub.1, P.sub.2, . . . , P.sub.n, are ports belonging to a group, G, which are all tuned to agent, A, then AP, CT, CD and CH may be defined as follows: AP=[(d.sub.1Vt.sub.1Vh.sub.1)&(d.sub.2Vt.sub.2Vh.sub.2) & . . . & (d.sub.nVt.sub.nVh.sub.n)], CT=(t.sub.1Vt.sub.2V . . . Vt.sub.n), CD=(d.sub.1 & d.sub.2 & . . . & d.sub.n), and CH=(h.sub.1 & h.sub.2 & . . . & h.sub.n), where t.sub.i, d.sub.i, h.sub.i for I=1, 2, . . . , n are the completion signals that port P.sub.i may send to agent A. At any time P.sub.i will send only of one the three completion signals, t.sub.i, d.sub.i or h.sub.i. The response of an agent sensing satisfaction of this agreement protocol, i.e. sensing that AP is true, will depend on the following cases: 1) CT is true: In this case the agent will switch the read/write memory designation of memories in its parent clock-ring, send a start signal, s, to the next agent on its parent clock-ring and then transfer itself to its idle state. Here at least one object In grouping G would have written output data into the write-memory of the parent clock-ring of the agent. 2) CD is true: In this case the agent will not switch the read/write memory designation of memories in its parent clock-ring, but will still send start signal, s, to its next agent and then transfer itself to its idle state. In this case, none of the objects in grouping G would have written output data into the write-memory of the parent clock-ring of the agent, and all objects in G would have agreed that it was necessary to pass on the same input data that they received to the next group of objects. This case will commonly arise only for monitor agents, which cause data to be observed and passed on unchanged.
3) CH is true: In this case computation will halt because the agent will not send signal s to its next agent, but will transfer itself to its idle state. All objects in grouping G would have agreed that computation should be halted. This case will usually occur with termination objects, which are specially designed to terminate specific computations.

At this point one may note the following: In each cycle of computation around a clock-ring, when new input data was ready for delivery in a read-memory, its recipient object, say object O, would be notified by its port, say port P, that is tuned to the read-memory. Object O would use this input data whenever it is ready to use it and take the time it needed to fully process it. Input data would remain undisturbed in read-memory until object O had completed all computations associated with that input data. Only after this computation had been completed would the agent tuned to port P, say agent A, sense agreement protocol satisfaction. It is only when the agreement protocol is satisfied would agent A transfer computations to the next group of objects on the clock-ring.

In computations cyding around a clock-ring, new input data would arrive at an object tuned to that clock-ring only once in each cycle of computation. Thus, new input data would arrive only after object O had completed all its computations on its previous input data. If pipelined implementation is used it may arrive soon after object O had completed its computations on its current inputs. In all cases, current input data would have been fully read and used before the arrival of the next input data, and the object would be free to receive its next input when it arrives. Hence the following holds true: Observation 4: All message transfers are buffer free. No input queues are needed to present messages to objects.

Protocol functions that have to be executed in order to deliver messages would always be presented to the communication processor C in the form of segments. Construction of segments, their presentation to C, and execution of protocol functions in segments are all deterministic processes. There is no non-determinism involved here. In addition, no pending messages would be lost because of dynamic updates that might be performed on TICC-pathways.

1.11. Monitor Agents on Watch-rings

It should be noted that all monitoring, reconfiguration, and dynamic evolution TICC-networks presented in the next few sections are the same kind of networks that were initially proposed in RESTCLK. In all the schemes presented below application objects need not be designed and implemented with additional code embedded in them in order to accommodate the kinds of monitoring, reconfiguration and dynamic updating operations that they might encounter during their lifetimes. Any desired change operation might be performed at any time during the lifetime of an application object, whether they had been a priori planned for or not. Application object code is totally independent of change requirements.

To the best of our knowledge, this unique feature holds true and is made available to application systems, only by TICC. RESTCLK proposed these schemes first, but it could not make them available for use in application systems, because of inability to produce an efficient implementation and guarantee loss less message delivery. TICC is the first system that makes it possible to fully realize the advantages of dynamic flexibility in practice, through its efficient implementation and loss less message delivery, achieved because of total integration of communication with computation.

The TICC-pathways shown in FIGS. 3, 6, 7, 8 and 9 are identical to the ones first proposed in RESTCLK. However, control signals exchanged in TICC pathways are different from the ones exchanged in RESTCLK. In addition, TICC agreement protocol is different from RESTCLK agreement protocol, and TICC protocols are executed in parallel with computations in a deterministic manner. In RESTCLK, they where executed in series with computations through interrupt control, in a non-deterministic manner.

Just as agents are attached to clock-rings, agents may also be attached to watch-rings. These agents are called monitor agents. FIG. 3 shows two monitor agents, B.sub.1 and B.sub.2, attached to watch-rings, w.sub.1, which tunes agent A to port P. Clock-ring C in the figure is the parent clock-ring of agent A (i.e., the clock-ring to which agent A is attached via its clock-port). By definition, all ports tuned to monitor agents B.sub.1 and B.sub.2 will have the same access rights to read from and write into memories of clock-ring C, as ports tuned to agent A. Thus, the monitoring agents B.sub.1 and B.sub.2 in FIG. 3 may be viewed as being descendants of agent A. It may be noted, objects M.sub.1 and M.sub.2 in FIG. 3 will have the same access rights to memories of clock-ring C, as object O.sub.1.

Just as monitoring agents might be attached to watch-ring w1 in FIG. 3, monitoring agents may also be recursively attached to watch-rings of other monitoring agents. Thus, monitor agents may be attached to watch-rings w.sub.2 and w.sub.3, in FIG. 3.

Monitor agents attached to the delivery side of a watch-ring (the side that goes from agent A to port P in FIG. 3) may be used to observe data flowing into port P. Thus agent B.sub.1 in FIG. 2, which is tuned to object M.sub.1, may be used to monitor data flowing into port P of object O.sub.1. Monitor agents attached to the dispatch side (i.e. the side that goes from port P to agent A in FIG. 2) may be used to observe data flowing out of the port. Thus agent B.sub.2 in FIG. 3, which is tuned to object M.sub.2, may be used to monitor data flowing out of port P.

Monitor agent B.sub.1 in FIG. 3, on the delivery side of watch ring, w.sub.1, would trap the start control signal, s, on its way from agent A to port P, and forward this signal to monitoring objects via ports that were tuned to it. These monitoring objects (like object M.sub.1 in FIG. 2) would then read data from the read-memory of clock-ring C and do whatever they wished to do with it. They may read it and pass it on to external observers or other objects. They may check it for security or errors (for debugging purposes) and pass it on to object O.sub.1. They may modify the data before passing it on to O.sub.1. They may encrypt (decrypt) it before passing it on to O.sub.1. They may use the input and output data to check new versions of application system objects in parallel with the existing versions. They may even halt computations if they detected a serious error or security violation.

When they had completed doing whatever they wanted to do, they would send completion signals, t, d, or h, back to the monitor agent B.sub.1. After sensing satisfaction of agreement protocol, monitor agent B.sub.1 would respond as described in Section 1.10 above, but for one difference: If the port tuned to agent B.sub.1 sent out completion signal t to agent B.sub.1, then B.sub.1 would do read/write memory switching and pass on the start signal s it received from agent A to port P. If it received completion signal d then B.sub.1 would not switch memories, but still pass on signal s to port P. If it received completion signal h then it would block the message from port P by not sending any start signal to it. This may happen, for example, if monitor object M.sub.1 detected a security violation.

It is possible to have more than one monitor attached to the delivery side of watch-ring w.sub.1. If, for example there was another monitor agent, say agent B.sub.3, next to B.sub.1 on the delivery side of the same watch-ring w.sub.1 in FIG. 3, then B.sub.3 would trap the signal sent by B.sub.1 to port P, and the same sequence of activities would then be repeated by B.sub.3 and objects tuned to it. There is in principle, no limit to the number of monitor agents that one may attach to a watch-ring. But, adding a monitoring agent to a TICC-pathway can double the message transfer latency. Thus, on the latency scale of TICC, dynamic monitoring is expensive. But on the latency scale of current technology it is not expensive.

Similar monitoring activities would occur on the dispatch side of a watch ring in FIG. 3, when monitor agent B.sub.2 traps completion signal, t, d, or h sent by port P. As soon as B.sub.2 received a completion signal from port P, it would send start signal s to ports that were tuned to it. The monitor objects might then read and modify data written into the write-memory of clock-ring C by the parent object of port P. As in the case of delivery side observation, the monitor objects could do any thing they liked with the output data. When B.sub.2 in FIG. 3 receives completion signal from monitor ports that are tuned to it, and senses satisfaction of agreement protocol, it will send to agent A (or to its next observer agent on the dispatch side, if there was one), the same completion signal that it had received earlier from port P.

1.12. Monitor Agents on Clock-rings

CSM (Communications System Manager) may be used to attach monitor agents to clock-rings as shown in FIG. 6 to observe data flowing into and out of a group of objects, G. In FIG. 6, G contains objects O.sub.1, O.sub.2 and O.sub.3. Monitor agent B.sub.1 would monitor data flowing out of group G, and monitor agent B.sub.2 would monitor data flowing into group G. The monitoring objects M1 and M.sub.2 in the figure could do anything they liked with data they read from read/write memories of clock-ring C. After completing their monitoring tasks they would send completion signals back to monitor agents, which would then take appropriate actions as described in Section 1.10, after sensing satisfaction of agreement protocol.

1.13. Suspending Computations Temporarily through Probe Networks

Any of the monitor networks, introduced in Sections 1.11 and 1.12, may be used to temporarily suspend computations if the monitoring objects in the network are pr