United States Patent Application20020087949
Kind CodeA1
Golender, Valery ; et al.July 4, 2002

System and method for software diagnostics using a combination of visual and dynamic tracing
Abstract
A software system is disclosed that provides remote troubleshooting and tracing of the execution of computer programs. The software system allows a remote software developer or help desk person to troubleshoot computer environment and installation problems such as missing or corrupted environment variables, files, DLLs, registry entries, and the like. In one embodiment the software system includes an information-gathering module that gathers run-time information about program execution, program interaction with the operating system and the system resources. The information-gathering module also monitors user actions and captures screen output. The information-gathering module passes the gathered information to an information-display module. The information-display module allows a support technician (e.g., a software developer, a help desk person, etc.) to see the user interactions with the program and corresponding reactions of the system. In one embodiment, the information-display module allows the support technician to remotely view environment variables, file access operations, system interactions, and user interactions that occur on the user's computer and locate failed operations that cause execution problems

Inventors:Golender; Valery (Kfar Saba, IL), Moshe; Ido Ben  (Herzlia, IL), Wygodny; Shlomo  (Ramut Hasharon, IL)
Correspondence Name and Address:620 NEWPORT CENTER DRIVE SIXTEENTH FLOOR
KNOBBE MARTENS OLSON & BEAR LLP
NEWPORT BEACH
CA
92660
US
Series Code:799338
Filed:March 5, 2001
U.S. Current Class:717/124
U.S. Class at Publication:717/124
Intern'l Class:G06F 009/44

Claims


What is claimed is:
1. A software system that facilitates the process of identifying and isolating software execution problems within a program without requiring modifications to the executable of the client program, said system comprising: an information-gathering module that monitors selected events occurring during execution of the client program and store data describing said events in a log file, said information-gathering module configured to monitor API events, message events, and program events, said information-gathering module further configured to obtain screen captures during execution of the client program, said information-gathering module configured to connect to said client program at runtime by hooking an in-memory executable image of said client program; and an information-display module that displays information from said log file, said information-display module configured to list events logged in said log file, said information-display module further configured to display screen captures obtained by said information-gathering module, said information-display module configured to run on a different computer than said information-gathering module, thereby allowing remote troubleshooting of said client program.

2. The software system of claim 1, wherein said information-gathering module monitors file access operations.

3. The software system of claim 1, wherein said information-gathering module monitors and highlights failed system interactions

4. The software system of claim 1, wherein said information-display module displays screen captures synchronized with logged events.

5. The software system of claim 1, wherein said information-display module replays screen captures in sequence.

6. The software system of claim 1, wherein said information-display module replays screen captures in sequence to produce a screen capture sequence, said information-display module also showing event information in sequence to produce an event information sequence, said event information sequence synchronized with said screen capture sequence.

7. The software system of claim 1, wherein said information-gathering module monitors attempts by said client program to access a windows registry.

8. The software system of claim 1, wherein said information-gathering module monitors use of DLLs.

9. The software system of claim 1, wherein said information-gathering module monitors attempts by said client program to spawn a subprocess or create a thread.

10. The software system of claim 1, wherein said information-gathering module monitors database operations.

11. The software system of claim 1, wherein said information-display module includes filters to control displaying of events in said log file.

12. The software system of claim 1, wherein said information-gathering module monitors interprocess communication performed by said client program.

13. The software system of claim 12, wherein said interprocess communication includes communication using COM.

14. The software system of claim 12, wherein said interprocess communication includes communication using DCOM.

15. The software system of claim 12, wherein said interprocess communication includes communication using semaphores.

16. The software system of claim 12, wherein said interprocess communication includes communication using shared memory.

17. The software system of claim 12, wherein said interprocess communication includes communication using network protocols.

18. A method for remotely troubleshooting problems occurring when trying to execute a client program on a remote computer, comprising: loading a client program on a remote computer to create an in-memory executable image of said client program; loading an information-gathering module on said remote computer, said information-gathering module configured to connect to said client program at runtime by hooking said in-memory executable image, said information-gathering module configured to monitor selected events occurring during execution of said client program and store event data describing said events, said information-gathering module configured to monitor API events, message events, and program events, said information-gathering module further configured to obtain screen captures during execution of said client program; loading an information-display module on a second computer; and sending said event data to said information-display module, said information-display module configured to receive said event data and list events logged in said event data, said information-display module further configured to display screen captures obtained by said information-gathering module.

19. The method of claim 18, wherein said information-gathering module monitors file access operations.

20. The method of claim 18, wherein said information-gathering module monitors attempts by said client program to access non-existent files.

21. The method of claim 18, wherein said information-gathering module monitors attempts by said client program to access protected files.

22. The method of claim 18, wherein said information-gathering module monitors attempts by said client program to write to a full disk.

23. The method of claim 18, wherein said information-gathering module monitors attempts by said client program to access locked files.

24. The method of claim 18, wherein said information-gathering module monitors attempts by said client program to access one or more registry entries.

25. The method of claim 18, wherein said information-gathering module monitors use of one or more DLLs.

26. The method of claim 18, wherein said information-gathering module monitors attempts by said client program to spawn a subprocess.

27. The method of claim 18, wherein said information-gathering module monitors attempts by said client program to create a thread.

28. The method of claim 18, wherein said information-gathering module monitors interprocess communication performed by said client program.

29. The method of claim 18, further comprising the step of defining one or more filters to control how said information-display module displays said event data.

30. The method of claim 18, wherein said information-display module creates a first window to display a list of events monitored by said information-gathering module, and wherein said information-display module creates a second window to display screen capture information from said remote computer.

31. The method of claim 30, wherein said information-display module creates a third window to display a list of DLLs used by said client program.

32. A system for remotely troubleshooting problems occurring when trying to execute a client program on a remote computer, comprising: means for monitoring events and capturing screenshots occurring during execution of a client program and storing data describing said events, said events including API events, message events, and program events; means for hooking said means for monitoring to an in-memory executable copy of said client program; and an information-display module that displaying said data describing said events, said information-display module configured to list events in chronological order, said information-display module further configured to display screen captures obtained by said information-gathering module.

Description



REFERENCE TO RELATED APPLICATIONS

[0001] The present application claims priority benefit of Provisional Application No. 60/186,636, filed Mar. 3, 2000, titled "SYSTEM AND METHOD FOR SOFTWARE DIAGNOSTICS USING COMBINATION OF VISUAL AND DYNAMIC TRACING," the disclosure of which is incorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to software tools for assisting software developers and help desk personnel in the task of monitoring and analyzing the execution of computer programs running on remote computers and detection and troubleshooting of execution problems.

[0004] 2. Description of the Related Art

[0005] The problem of ascertaining why a particular piece of software is malfunctioning is currently solved by a number of techniques including static analysis of configuration problems and conventional debugging techniques such as run-time debugging and tracing. Despite the significant diversity in software tracing and debugging programs ("debuggers"), virtually all debuggers share a common operational model: the developer notices the presence of a bug during normal execution, and then uses the debugger to examine the program's behavior. The second part of this process is usually accomplished by setting a breakpoint near a possibly flawed section of code, and upon reaching the breakpoint, single-stepping forward through the section of code to evaluate the cause of the problem.

[0006] Two significant problems arise in using this model. First, the developer needs to know in advance where the problem resides in order to set an appropriate breakpoint location. Setting such a breakpoint can be difficult when working with an event-driven system (such as the Microsoft Windows.RTM. operating system), because the developer does not always know which of the event handlers (callbacks) will be called.

[0007] The second problem is that some bugs give rise to actual errors only during specific execution conditions, and these conditions cannot always be reproduced during the debugging process. For example, a program error that occurs during normal execution may not occur during execution under the debugger, since the debugger affects the execution of the program. This situation is analogous to the famous "Heizenberg effect" in physics: the tool that is used to analyze the phenomena actually changes its characteristics. The Heizenberg effect is especially apparent during the debugging of time-dependent applications, since these applications rely on specific timing and synchronization conditions that are significantly altered when the program is executed step-by-step with the debugger.

[0008] An example of this second type of problem is commonly encountered when software developers attempt to diagnose problems that have been identified by customers and other end users. Quite often, software problems appear for the first time at a customer's site. When trying to debug these problems at the development site (typically in response to a bug report), the developer often discovers that the problem cannot be reproduced. The reasons for this inability to reproduce the bug may range from an inaccurate description given by the customer, to a difference in environments such as files, memory size, system library versions, and configuration information. Distributed, client/server, and parallel systems, especially multi-threaded and multi-process systems, are notorious for having non-reproducible problems because these systems depend heavily on timing and synchronization sequences that cannot easily be duplicated.

[0009] When a bug cannot be reproduced at the development site, the developer normally cannot use a debugger, and generally must resort to the tedious, and often unsuccessful, task of manually analyzing the source code. Alternatively, a member of the software development group can be sent to the customer site to debug the program on the computer system on which the bug was detected. Unfortunately, sending a developer to a customer's site is often prohibitively time consuming and expensive, and the process of setting up a debugging environment (source code files, compiler, debugger, etc.) at the customer site can be burdensome to the customer.

[0010] Some software developers attempt to resolve the problem of monitoring the execution of an application by imbedding tracing code in the source code of the application. The imbedded tracing code is designed to provide information regarding the execution of the application. Often, this imbedded code is no more than code to print messages which are conditioned by some flag that can be enabled in response to a user request. Unfortunately, the imbedded code solution depends on inserting the tracing code into the source prior to compiling and linking the shipped version of the application. To be effective, the imbedded code must be placed logically near a bug in the source code so that the trace data will provide the necessary information. Trying to anticipate where a bug will occur is, in general, a futile task. Often there is no imbedded code where it is needed, and once the application has been shipped it is too late to add the desired code.

[0011] Another drawback of current monitoring systems is the inability to correctly handle parallel execution, such as in a multiprocessor system. The monitoring systems mentioned above are designed for serial execution (single processor) architectures. Using serial techniques for parallel systems may cause several problems. First, the sampling activity done in the various parallel entities (threads or processes) may interfere with each other (e.g., the trace data produced by one entity may be over written by another entity). Second, the systems used to analyze the trace data cannot assume that the trace is sequential. For example, the function call graph in a serial environment is a simple tree. In a parallel processing environment, the function call graph is no longer a simple tree, but a collection of trees. There is a time-based relationship between each tree in the collection. Displaying the trace data as a separate calling tree for each entity is not appropriate, as this does not reveal when, during the execution, contexts switches were done between the various parallel entities. The location of the context switches in the execution sequence can be very important for debugging problems related to parallel processing.

[0012] Moreover, the computing model used in the Microsoft Windows environment, which is based on the use of numerous sophisticated and error-prone applications with many components interacting in a complex way, requires a significant effort for system servicing and support. Many Windows problems experienced by users are software configuration errors that commonly occur when the users add new programs and devices to their computers. Problems also occur due to the corruption of important system files, resources, or setups. Another important source of software malfunctioning is "unexpected" user behavior that was not envisioned by the software developers (as occurs when, for example, the user inadvertently deletes a file needed by the application).

SUMMARY OF THE INVENTION

[0013] The present invention overcomes these and other problems associated with debugging and tracing the execution of computer programs. The present invention provides features that allow a remote software developer or help desk person to debug configuration problems such as missing or corrupted environment variables, files, DLLs, registry entries, and the like. In one embodiment, a "visual problem monitor" system includes an information-gathering module that gathers run-time information about program execution, program interaction with the operating system and the system resources. The information-gathering module also monitors user actions and captures screen output. In one embodiment, file interactions, DLL loading and/or registry accesses are monitored non-intrusively. In one embodiment, the relevant support information captured by the information-gathering module is saved in a log file. The information-gathering module passes the gathered information to an information-display module. In one embodiment, the information-gathering module attaches to the running program using a hooking process. The program being monitored need not be specially modified or adapted to allow the information-gathering module to attach.

[0014] The information-display module allows a support technician (e.g., a software developer, a help desk person, etc.) to see the user interactions with the program and corresponding reactions of the system. This eliminates the "questions and answers" game that support personnel often play with users in order to understand what the user did and what happened on the user's PC. In one embodiment, the information-display module allows the support technician to remotely view environment variables, file access operations, system interactions, and user interactions that occur on the user's computer. In one embodiment, the information-display module allows the support technician to remotely view crash information (in the event of a crash on the user's computer), system information from the user's computer, and screen captures from the user's computer.

[0015] One aspect of the present invention is a software system that facilitates the process of identifying and isolating bugs within a client program by allowing a developer to trace the execution paths of the client. The tracing can be performed without requiring modifications to the executable or source code files of the client program. In one embodiment, the system interaction tracing can be performed even without any knowledge of the source code or debug information of the client. Preferably, the trace data collected during the tracing operation is collected according to instructions in a trace control dataset, which is preferably stored in a Trace Control Information (TCI) file. Typically, the developer generates the TCI file by using a trace options editor program having a graphical user interface. The options editor displays the client's source code representation on a display screen together with controls that allow the software developer to interactively specify the source code and data elements to be traced. The options editor may use information created by a compiler or linker, such as debug information, in order to provide more information about the client and thereby make the process of selecting trace options easier. Once the trace options are selected, the client is run on a computer, and a tracing library is used to attach to the memory image of the client (the client process). The tracing library is configured to monitor execution of the client, and to collect trace data, based on selections in the trace options. The trace data collected by the tracing library is written to an encoded buffer in memory. The data in the buffer may optionally be saved to a trace log file for later use.

[0016] The developer then uses a trace analyzer program, also having a graphical user interface, to decode the trace information into a human-readable form, again using the debug information, and displays translated trace information on the display screen to allow the developer to analyze the execution of the client program. In a preferred embodiment, the trace options editor and the trace analyzer are combined into a single program called the analyzer. The analyzer is preferably configured to run under the control of a multi-process operating system and to allow the developer to trace multiple threads and multiple processes. The tracing library is preferably configured to run in the same process memory space as the client thereby tracing the execution of the client program without the need for context switches.

[0017] In one embodiment, the software system provides a remote mode that enables the client program to be traced at a remote site, such as by the customer at a remote customer site, and then analyzed at the developer site. When the remote mode is used, the developer sends the TCI file for the particular client to a remote user site together with a small executable file called the tracing "agent." The agent is adapted to be used at the remote user site as a stand-alone tracing component that enables a remote customer, who does not have access to the source code of the client, to generate a trace file that represents execution of the client application at the remote site. The trace file is then sent to the developer site (such as by email), and is analyzed by the software developer using the analyzer. The remote mode thus enables the software developer to analyze how the client program is operating at the remote site, without the need to visit the remote site, and without exposing to the customer the source code or other confidential details of the client program.

[0018] The software system also preferably implements an online mode that enables the software developer to interactively trace and analyze the execution of the client. When the software system is used in the online mode, the analyzer and agent are effectively combined into one program that a developer can use to generate trace options, run and trace the client, and display the trace results in near real-time on the display screen during execution of the client program.

[0019] In one embodiment, the support technician typically uses a default TCI file that allows the trace system to trace interactions and other important API functions without access to source code and/or debug information. This is useful for troubleshooting commercial applications such Microsoft Office, Internet Information Server, CRM and ERP systems, and other legacy products and the like.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] A software system which embodies the various features of the invention will now be described with reference to the following drawings.

[0021] FIG. 1A is a block diagram illustrating the use of the system to create a trace control information file.

[0022] FIG. 1B is a block diagram illustrating the use of the system in remote mode.

[0023] FIG. 1C is a block diagram illustrating the use of the system to analyze a trace log file.

[0024] FIG. 2 is a block diagram illustrating the use of the system in online mode.

[0025] FIG. 3A is an illustration of a typical main frame window provided by the system's trace analyzer module.

[0026] FIG. 3B is an illustration of a typical main frame window showing multiple threads.

[0027] FIG. 4 illustrates a process list window that lists the processes to be traced.

[0028] FIG. 5 illustrates the trace options window that allows a developer to select the functions to be traced and the information to be collected during the trace.

[0029] FIG. 6 illustrates a file page window that provides a hierarchical tree of trace objects listed according to hierarchical level.

[0030] FIG. 7 illustrates a class page window that provides a hierarchical tree of trace objects sorted by class.

[0031] FIG. 8 illustrates the process page window that provides a hierarchical tree that displays the traced process, and the threads for each process.

[0032] FIG. 9 illustrates the running process window that allows the user to attach to and start tracing a process that is already running.

[0033] FIG. 10 illustrates the start process window that allows the user to load an executable file, attach to the loaded file, execute the loaded file, and start tracing the loaded file.

[0034] FIG. 11 shows a trace detail pane that displays a C++ class having several members and methods, a class derived from another classes, and classes as members of a class.

[0035] FIG. 12 illustrates a trace tree pane, showing a break (or tear) in the trace tree where tracing was stopped and then restarted.

[0036] FIG. 13 is a flowchart which illustrates the process of attaching to (hooking) a running process.

[0037] FIG. 14 is a flowchart which illustrates the process of loading an executable file and attaching to (hooking) the program.

[0038] FIG. 15 is a block diagram showing the architecture of the visual problem monitor system including the information-gathering module and the information-display module.

[0039] FIG. 16 shows a multi-window display provided by the information-display module.

[0040] FIG. 17 is a flowchart illustrating the use of the system to solve software support problems.

[0041] In the drawings, like reference numbers are used to indicate like or functionally similar elements. In addition, the first digit or digits of each reference number generally indicate the figure number in which the referenced item first appears.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0042] The present invention provides a new model for software diagnostics by tracing the execution path of a computer program and user interaction with the computer program. In the preferred embodiment of the invention, this tracing model is implemented within a set of tracing and debugging tools that are collectively referred to as the BugTrapper system ("BugTrapper"). The BugTrapper tools are used to monitor and analyze the execution of a computer program, referred to as a client. One feature of the BugTrapper is that it does not require special instructions or commands to be imbedded within the source code of the client, and it does not require any modifications to be made to the source or executable files of the client. "Tracing," or "to trace," refers generally to the process of using a monitoring program to monitor and record information about the execution of the client while the client is running. A "trace" generally refers to the information recorded during tracing. Unlike conventional debuggers that use breakpoints to stop the execution of a client, the BugTrapper tools collect data while the client is running. Using a process called "attaching", the BugTrapper tools instrument the client by inserting interrupt instructions at strategic points defined by the developer (such as function entry points) in the memory image of the client. This instrumentation process is analogous to the process of connecting a logic analyzer to a circuit board by connecting probes to test points on the circuit board. When these interrupts are triggered, the BugTrapper collects trace information about the client without the need for a context switch, and then allows the client to continue running.

[0043] The BugTrapper implementations described herein operate under, and are therefore disclosed in terms of, the Windows-NT/2000 and Windows-95/98 operating systems and the like. It will be apparent, however, that the underlying techniques can be implemented using other operating systems that provide similar services. Other embodiments of the invention will be apparent from the following detailed description of the BugTrapper.

[0044] Overview of BugTrapper System and User Model

[0045] The BugTrapper provides two modes of use, remote mode, and online mode. As discussed in more detail in the following text accompanying FIGS 1A-1C, using remote mode a developer can trace the remote execution of a program that has been shipped to an end user (e.g. a customer or beta user) without providing a special version of the code to the user, and without visiting the user's site or exposing the source code level details of the program to the user. The system can also be used in an online mode wherein the developer can interactively trace a program and view the trace results in real time.

[0046] Remote Mode

[0047] Remote mode involves three basic steps shown in FIGS. 1A through 1C. In step 1, shown in FIG. 1A, a developer 112 uses a program called the BugTrapper analyzer 106 to create a file called a trace control information (TCI) file 120. The TCI file 120 contains instructions that specify what information is to be collected from a program to be traced (the client). The analyzer 106 obtains information about the client from a build (e.g., compile and link) by-product, such as a link map file, or, as in the preferred embodiment, a debug information file 121. Typically, the debug information file 112 will be created by a compiler and will contain information such as the names and addresses of software modules, call windows, etc. for the specific client. The developer 112 then sends the TCI file 120 and a small tracing application called the agent 104 to a user 110 as shown in FIG. 1B. The user 110 runs the agent 104 and the client 102 and instructs the agent 104 to attach to the client 102. The agent attaches to the client 102 by loading a client-side trace library 125 into the address space of the client 102. An agent-side trace library 124 is provided in the agent 104. The client-side trace library 125 and the agent-side trace library 124 are referred to collectively as the "trace library." The agent-side trace library 124 and the client-side trace library 125 exchange messages through normal interprocess communication mechanisms, and through a shared memory trace buffer 105. The agent-side trace library 124 uses information from the TCI file 102
to attach the client-side trace library 125 into the client 102, and thereby obtain the trace information requested by the developer 112.

[0048] The agent 104 and the client-side trace library 125 run in the same context so that the client 102 can signal the client-side trace library 125 without performing a context switch and thus without incurring the overhead of a context switch. For the purposes herein, a context can be a process, a thread, or any other unit of dispatch in a computer operating system. The client 102 can be any type of software module, including but not limited to, an application program, a device driver, or a dynamic link library (DLL), or a combination thereof. The client 102 can run in a single thread, or in multiple processes and/or multiple threads.

[0049] In operation, the agent 104 attaches to the client 102 using a process known as "attaching." The agent 104 attaches to the client 102, either when the client 102 is being loaded or once the client 102 is running. Once attached, the agent 104 extracts trace information, such as execution paths, subroutine calls, and variable usage, from the client 102. Again, the TCI file 120 contains instructions to the client-side trace library 125 regarding the trace data to collect. The trace data collected by the client-side trace library 125 is written to the trace buffer 105. On command from the user 110 (such as when a bug manifests itself), the agent 104 copies the contents of the trace buffer 105 to a trace log file 122. In some cases, the log data is written to a file automatically, such as when the client terminates. The user 110 sends the trace log file 122 back to the developer 112. As shown in FIG. 1C, the developer 112 then uses the analyzer 106 to view the information contained in the trace log file 122. When generating screen displays for the developer 112, the analyzer 106 obtains information from the debug information file 121. Since the analyzer 106 is used to create the TCI file 120 and to view the results in the trace log file 122, the developer can edit the TCI file 120 or create a new TCI file 120 while viewing results from a trace log file 122.

[0050] Remote mode is used primarily to provide support to users 110 that are located remotely relative to the developer 112. In remote mode, the agent 104 is provided to the user 110 as a stand-alone component that enables the user to generate a trace log file that represents the execution of the client. The TCI file 120 and the trace log file 122 both may contain data that discloses secrets about the internal operation of the client 102 and thus both files are written using an encoded format that is not readily decipherable by the user 110. Thus, in providing the TCI file 120 and the agent 104 to the user, the developer 112 is not divulging information to the user that would readily divulge secrets about the client 102 or help the user 110 in an attempt to reverse engineer the client 102. The Agent traces the client without any need for modification of the client. The developer 112 does not need to build a special version of the client 102 executable file and send it to the customer, neither does the customer need to pre-process the client executable file before tracing.

[0051] From the perspective of the remote user, the agent 104 acts essentially as a black box that records the execution path of the client 102. As explained above, the trace itself is not displayed on the screen, but immediately after the bug reoccurs in the application, the user 110
can dump the trace data to the trace log file 122 and send this file to the developer 112 (such as by email) for analysis. The developer 112 then uses the analyzer 106 to view the trace log file created by the user 110
and identify the problematic execution sequence. In remote mode, the user 110 does not need access to the source code or the debug information. The agent 104, the TCI file 120, and the trace log file 122 are preferably small enough to be sent via email between the developer 112 and the user 110. Further details regarding the remote mode of operation are provided in the sections below.

[0052] Online Mode

[0053] As shown in FIG. 2, the BugTrapper may also be used in an online mode rather than remote mode as shown in the previous figures. In this mode, the BugTrapper is used by the developer 112 to locally analyze a client 102, which will typically be a program that is still being developed. For example, the online mode can be used as an aid during the development as a preliminary or complementary step to using a conventional debugger. In many cases it is hard to tell exactly where a bug resides and, therefore, where breakpoints should be inserted. Online mode provides the proper basis for setting these breakpoints. Later, if further analysis is required, a more conventional debugger can be used. In online mode, the analyzer 106 is used to perform all of its normal operations (e.g. creating the TCI file 120 and viewing the trace results) as well as the operations performed by the agent 104 in remote mode. Thus, in online mode, the agent 104 is not used because it is not needed. The developer 112 uses the analyzer 106 to run the client 102 and attach the client-side trace library 125 to the client 102. In online mode, the analyzer 106 reads the trace buffer 105 in near real-time to provide near real-time analysis functionality. In the online mode, the analyzer 106
immediately displays the trace information to the developer 112.

[0054] The developer 112 uses the analyzer 106 to interactively create trace control information (TCI). The TCI may be sent to the client-side trace library 125 via file input/output operations or through conventional inter-process communication mechanisms such as shared memory, message passing or remote procedure calls. The TCI indicates to the client-side trace library 125 what portions of the client 102 to trace, and when the tracing is to be performed. As the client program 102
runs, the client-side trace library 125 collects the trace information and relays the information back to the analyzer 106, which displays the information in near real-time within one or more windows of the BugTrapper.

[0055] Operational Overview of the Tracing Function

[0056] Regardless of which operational mode is used (online or remote), the client 102 is run in conjunction with the client-side trace library 125. As described in detail below, the client-side trace library 125 is attached to the in-memory image of the client 102 and generates trace information that describes the execution of the client 102. The TCI file 120, provided by the developer 112, specifies where tracing is to take place and what information will be stored. Because the client is traced without the need for context switches, the effect of this tracing operation on the performance of the client 102 is minimal, so that even time-dependent bugs can be reliably diagnosed. As described below, this process does not require any modification to the source or object code files of the client 102, and can therefore be used with a client 102 that was not designed to be traced or debugged.

[0057] The analyzer 106 is used to analyze the trace data and isolate the bug. The developer 112 may either analyze the trace data as it is generated (online mode), or the developer 112 may analyze trace data stored in the trace log file 122 (mainly remote mode). As described below, the assembly level information in the trace log file is converted back to a source level format using the same debug information used to create the TCI file 120. During the trace analysis process, the analyzer 106 provides the developer 112 with execution analysis options that are similar to those of conventional debuggers, including options for single stepping and running forward through the traced execution of the client 102 while monitoring program variables. In addition, the analyzer 106
allows the developer 112 to step backward in the trace, and to search for breakpoints both in the future and in the past.

[0058] The attaching mechanism used to attach the client-side trace library 125 to the client 102 involves replacing selected object code instructions (or fields of such instructions) of the memory image of the client 102 with interrupt (INT) instructions to create trace points. The locations of the interrupts are specified by the TCI file 122 that is created for the specific client 102. When such an interrupt instruction is executed, a branch occurs to the tracing library 125. The client-side trace library 125 logs the event of passing the trace point location and captures pre-specified state information, such as values of specific program variables and microprocessor registers. The instructions that are replaced by the interrupt instructions are maintained within a separate data structure to preserve the functionality of the application.

[0059] Overview of the Analyzer User Interface

[0060] The analyzer 106 comprises a User Interface module that reads trace data, either from the trace buffer 105 (during on-line mode tracing) or from the trace log file 122 (e.g. after remote tracing) and displays the data in a format, such as a trace tree, that shows the sequence of traced events that have occurred during execution of the client 102. Much of the trace data comprises assembly addresses. With reference to FIG. 1C, the analyzer 106 uses the debug information 121 to translate the traced assembly addresses to comprehensive strings that are meaningful to the developer. In order to save memory and gain performance, this translation to strings is preferably done only for the portion of the trace data which is displayed at any given time, not the whole database of trace data. Thus, for example, in formatting a screen display in the user interface, only the trace data needed for the display in the user interface at any given time is read from the log file 122. This allows the analyzer 106 to display data from a trace log file 122 with more than a million trace records.

[0061] The debug information 121 is preferably created by a compiler when the client is compiled. Using the debug information 121 the analyzer translates function names and source lines to addresses when creating the TCI file 120. Conversely, the analyzer 106 uses the debug information 121
to translate addresses in the trace data back into function names and source lines when formatting a display for the user interface. One skilled in the art will recognize that other build information may be used as well, including, for example, information in a linker map file and the Type Library information available in a Microsoft OLE-compliant executable.

[0062] Preferably, the debug information is never used by the trace libraries 124, 125 or the agent 102, but only by the analyzer 106. This is desirable for speed because debug information access is typically relatively slow. This is also desirable for security since there is no need to send to the user 110 any symbolic information that might disclose confidential information about the client 102.

[0063] The analyzer 106 allows the developer 112 to open multiple trace tree windows and define a different filter (trace control instructions) for each of window. When reading a trace record, each window filter is preferably examined separately to see if the record should be displayed. The filters from the various windows are combined in order to create the TCI file 120, which is read by the client-side trace library 125. In other words, the multiple windows with different filters are handled by the User Interface, and the client-side trace library 125 reads from a single TCI file 120.

[0064] FIG. 3A is an illustration of a typical frame window 300 provided by the analyzer 106. The analyzer frame window 300 displays similar information both when performing online tracing (online mode) and when displaying a trace log file (remote mode). The frame window 300 is a split frame having four panes. The panes include a trace tree pane 310, an "executable" pane 314, a trace detail pane 316, and a source pane 318. The analyzer frame 300 further provides a menu bar 304, a dockable toolbar 306, and a status bar 312. The menu bar 304 provides drop-down menus labeled "File," "Edit," "View," "Executable," and "Help." The trace tree pane 310 contains a thread caption bar 320, described below in connection with the Analyzer. Below the thread caption bar 320 is a trace tree 330. The trace tree 330 is a hierarchical tree control that graphically displays the current trace information for the execution thread indicated in the thread caption bar 320. The trace tree 330
displays, in a hierarchical tree graph, the sequence of function calls and returns (the dynamic call tree) in the executable programs (collectively the client 102) listed in the executable pane 314. Traced source lines also appear in the trace tree, between the call and return of the function in which the lines are located. FIG. 3 illustrates a single thread header and thread tree combination (the items 320 and 330). However, multiple thread captions and thread tree combinations will be displayed when there are context switches between multiple threads or processes.

[0065] The executable pane 314 displays an "executable" listbox 361. Each line in the executable listbox 361 displays information about an executable image that is currently being traced. Each line in the list box 361 displays a filename field 360, a process id (PID) field 362, and a status field 364. Typical values for the status field 364 include "running," "inactive," and "exited." The trace detail pane 316 contains a trace detail tree 350, which that is preferably implemented as a conventional hierarchical tree control. The trace detail tree 350
displays attributes, variables such as arguments in a function call window, and function return values of a function selected in the trace tree 330. The source pane 318 displays a source listing of one of the files listed in the source listbox 361. The source listing displayed in the source pane 318 corresponds to the source code of the function selected in the trace tree 330 of to the selected source line. The source code is automatically scrolled to the location of the selected function.

[0066] The frame window 300 also contains a title bar which displays the name of the analyzer 106 and a file name of a log or Trace Control Information (TCI) file that is currently open. If the current file has not yet been saved, the string "-New" is concatenated to the file name display.

[0067] The status bar 312 displays the status of the analyzer 106 (e.g. Ready), the source code file containing the source code listed in the source code pane 318, and the line and column number of a current line in the source pane 318.

[0068] The toolbar 306 provides windows tooltips and the buttons listed in Table 1.

[0069] FIG. 3B shows a typical frame window 300 with multiple threads in the trace tree pane 310. FIG. 3B shows a separate trace tree for each thread and a thread caption bar (similar to the thread caption bar 320
shown in FIG. 3A) for each thread.

1TABLE 1
Buttons on the toolbar 306
Menu Button Equivalent Key Description "Open" File .vertline. Open Ctrl+O Opens an existing Trace Control Information file. "Save" File .vertline. Save Ctrl+S Saves the current Trace Control Information to a file. "Clear" Edit .vertline. Clear Clears the Trace Tree pane, the All Trace Detail pane, and the Source pane. "Find" Edit .vertline. Find Ctrl+F Finds a specific string in the executable source code or trace tree. "Bookmark" Edit .vertline. Adds or deletes a bookmark for Bookmark the currently selected function, or edits the name of an existing bookmark. "Window" View .vertline. New Opens a new instance of the Window analyzer. "Start/Stop" Executable .vertline. Starts or stops tracing the Start/Stop executables listed in the Trace Executable pane. "Add" Executable .vertline. Ins Adds an executable to the Add Executable pane, without running it, so that it can be run and traced at a later date. "Run" Executable .vertline. F5 When the <New Executable> Run string is selected, adds an executable to the Executable pane, starts this executable and begins tracing. When an executable which is not running is selected in the Executable pane, starts this executable and begins tracing. "Attach" Executable .vertline. When the <New Executable> Attach string is selected, attaches a running executable to the Executable pane and begins tracing. When an executable that is not traced is selected, attaches the running process of this executable, if it exists. "Terminate" Executable .vertline. Terminates the executable Terminate currently selected in the Executable pane. "Options" Executable .vertline. Opens the Trace Options Trace Options window in which you can specify the elements that you want to trace for the selected executable.

[0070] Using the Analyzer to Create the TCI File

[0071] The TCI file 120 specifies one or more clients 102 and the specific elements (functions, processes and so on) to be traced either in online or remote mode. The TCI information is specified in a trace options window (described in the text associated with FIG. 5). The TCI file 120
is used to save trace control information so that the same trace options can be used at a later time and to send trace control information to a user 110 to trace the client 102. The subsections that follow provide a general overview of selecting trace information for a TCI file 120 and descriptions of various trace options, different ways to access the trace options, and how to use the trace options to specify elements to be traced.

[0072] The TCI file 120 for a client 102 is interactively generated by the software developer 112 using the analyzer 106. During this process, the analyzer 106 displays the source structure (modules, directories, source files, C++ classes, functions, etc.) of the client 102 using the source code debug information 121 generated by the compiler during compilation of the client 102. As is well known in the art, such debug information 121 may be in an open format (as with a COFF structure), or proprietary format (such as the Microsoft PDB format), and can be accessed using an appropriate application program interface (API). Using the analyzer 106, the developer 112 selects the functions and source code lines to be traced. This information is then translated into addresses and instructions that are recorded within the TCI file. In other embodiments of the invention, trace points may be added to the memory image of the client 102 by scanning the image's object code "on the fly" for specific types of object code instructions to be replaced.

[0073] Trace control information is defined for a specific client 102. In order to access the trace tool, the developer 112 first adds the desired programs 110 to the list of executables shown in the executable pane 314
shown in FIG. 3. The executable is preferably compiled in a manner such that debug information is available. In many development environments, debug information may be included in an optimized "release" build such that creation of the debug information does not affect the optimization. In a preferred embodiment, the debug information is stored in a PDB file. If during an attempt to add the executable to the Executable pane 314 a PDB file is not found by the analyzer 106, the developer 112 is prompted to specify the location of the PDB file. Once an executable has been added to the Executable pane 314, the developer 112 can set the trace control information using the available trace options described below.

[0074] To use the online mode to trace an executable 314 that is not currently running, the developer selects an executable file to run as the client 102. To run an executable file, the developer 112 double-clicks the <New Executable> text 365 in the executable pane 314 to open a file selection window thus allowing the developer 112 to select the required executable. Alternatively, the developer 112 can click the Run button on the toolbar 306, or select the Run option from the "Executable" menu after selecting the <New Executable> text. The file selection window provides a command line arguments text box to allow the developer 112 to specify command line arguments for the selected executable file.

[0075] After selecting an executable to be a client 102 a trace options window (as described below in connection with FIG. 5.) is displayed which allows the developer 112 to specify which functions to trace. After selecting the desired trace options and closing the trace options window, the executable starts running and BugTrapper starts tracing. As the client 102 runs, trace data is collected and the trace data are immediately displayed in the analyzer frame window 300 as shown in FIG. 3.

[0076] To cause the analyzer 106 to trace an executable that is currently running, the developer 112 may click the "Attach" button on the toolbar 306 after selecting the <New Executable> text. Upon clicking the "Attach" button on the toolbar 306, a process list window 400 is displayed, as shown in FIG. 4. The process list window 400 displays either an applications list 402 or a process list (not shown). One skilled in the art will understand that, according to the Windows operating system, an application is a process that is attached to a top level window. The applications list 402 displays a list of all of the applications that are currently running. The process list window 400 also provides a process list, which is a list of the processes that are currently running. The applications list 402 is selected for display by an applications list tab and the process list is selected for display by pressing the applications list tab. To select a process from the process list window, the developer 112 clicks the Applications tab or the Processes tab as required, and then selects the application or process to be traced. The process list window 400 also provides a refresh button to refresh the application list and the process list, and an OK button to close the process list window 400.

[0077] After the developer 112 selects an application or process using the process list window 400, and closes the process list window 400, the analyzer 106 displays a trace options window 500, as shown in FIG. 6
below. The application or process selected in the process list window 400
becomes the client 102. The analyzer 106 can display trace data for multiple processes and applications (multiple clients); however, for the sake of simplicity, the operation of the analyzer 106 is described below primarily in terms of a single client 102. The trace options window 500
allows the developer 112 to select the functions to be traced. Selecting trace options is described below in the text in connection with FIG. 5. After selecting trace options and closing the trace options window 500, the client-side trace library 125 is attached to the client 102, and the client 102 continues to run. The client-side trace library 125 thereafter collects trace information that reflects the execution of the client 102
and sends the trace information to the analyzer 106 for display.

[0078] The developer can also add an executable file (e.g. a windows .exe file) to the executable pane 314 without actually running the executable file. To add an executable that is not currently running (and which is not to be run yet) to the executable pane 314, the developer 112 selects the <New Executable> text 365 and then clicks the Add button on the toolbar 306, whereupon a file selection window is displayed. The developer 112 uses the file selection window to select the desired executable and closes the file selection window. The file selection window provides a text field to allow the developer to enter command line arguments for the executable. Upon closing the file selection window, the trace options window 500 is displayed which enables the developer 112 to select the functions to trace. After selecting trace options and closing the trace options window, the selected executable is inserted into the Executable pane 314 with the status "Inactive." The developer can then begin a trace on the inactive executable by selecting the executable in the executable pane 314 and clicking the "Run" or "Attach" buttons on the toolbar 306.

[0079] In a preferred embodiment, the developer 112 can only create a new TCI file 120 when the executable list 361 contains the names of one or more executable files. To create a TCI file 120, the developer 112
selects "Save" from the "File" menu. The developer can also open a previously saved TCI file 120 and then modify the TCI file 120 using the trace options window 500. Once a TCI file 120 has been created (or opened) the developer 112 can select an executable from the executable pane and click the "Run" or "Attach" button from the toolbar to start tracing.

[0080] FIG. 5 illustrates the trace options window 500. The trace options window 500 is divided into two panes, a filter tree pane 501 and a source code pane 504. The filter tree pane 501 is a multi-page pane having four pages: a file page 602 which is selected by a file tab 510; a class page 702 which is selected by a class tab 512; a name page 502 which is selected by a name tab 514; and a process page 802 which is selected by a process tab 516. The name page 502 is shown in FIG. 5. The file page 602
is shown in FIG. 6, the class page 702 is shown in FIG. 7, and the process page 802 is shown in FIG. 8. The trace options window also provides an "advanced" button 520 and an "add DLL" button 522.

[0081] The trace options window 500 allows the developer 112 to specify which functions to trace and what to display in the trace tree 330. The trace options window 502 allows the developer 112 to filter out functions which have already been traced. These functions will be redisplayed where they were traced if they are later re-select for tracing. If a function is not selected for tracing in the trace options window 500, it will not be displayed in the trace tree 330. If a function that was not traced is filtered in again, it will not appear in that portion of the information that has already been displayed.

[0082] For example, consider the following C++ program:

2
f1 ( ) { } f2 ( ) { } main ( ) { while (1) { getchar (c) ; f1 ( ) ; f2 ( ) ; } }

[0083] Using the above program as an example of a client 102, and assuming that the user forms the following steps:

[0084] 1. Select the functions f1, f2, and main for tracing in the trace options window 500.

[0085] 2. Execute one loop and view the resulting trace.

[0086] 3. Deselect (filter out) f2 for tracing in the Trace Options window 500.

[0087] 4. Execute the loop again.

[0088] 5. Re-select (filter in) f2 for tracing in the Trace Options window.

[0089] 6. Execute the loop once more.

[0090] Then, after Step 4 the following depicts the elements that are displayed in the trace window, with the symbol .about..about..about.repre- senting a tear in the trace as described below in connection with FIG. 12. 1 main f1