Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
6801940
Moran , ; et al.
October 5, 2004
Title
Application performance monitoring expert
Abstract
A system, method, and computer program product are provided for expert application performance analysis. An application is monitored. Performance data is gathered during the monitoring. A set of metrics is generated based on the performance data. A performance of the application is measured from at least one of a client perspective, a server perspective, and a network perspective using the metrics.
Inventors:
Moran; Mike
(Sandwich,
IL
)
, Liubinskas; Tauras
(Aurura,
IL
)
, Goral; Jack
(Woodridge,
IL
)
Assignee:
Networks Associates Technology, Inc.
(,
Santa Clara
)
Appl. No.:
045773
Filed:
January 11, 2002
Current U.S. Class:
709/224
709/228
370/230
Current International Class:
H04L 12/56 (20060101)
Field of Search:
709/203,216,218,220,223,224,225,226,228,235 705/1 370/245,230,329 714/25
U.S. Patent Documents
5964837
October 1999
Chao et al.
5983278
November 1999
Chong et al.
5999908
December 1999
Abelow
6023507
February 2000
Wookey
6078956
June 2000
Bryant et al.
6285658
September 2001
Packer
6363421
March 2002
Barker et al.
6427063
July 2002
Cook et al.
6469991
October 2002
Chuah
6601020
July 2003
Myers
6633835
October 2003
Moran et al.
6701363
March 2004
Chiu et al.
6714976
March 2004
Wilson et al.
Primary Examiner:
Jean; Frantz B.
Assistant Examiner:
Dinh; Khanh Quang
Attorney, Agent or Firm:
Silicon Valley IP Group, PC Zilka; Kevin J. Hamaty; Christopher J.
Parent Case Text
RELATED APPLICATION
This application is a continuation of a parent application entitled "MUITI-SEGMENT NETWORK APPLICATION MONITORING AND CORRELATION ARCHITECTURE" and naming Mike Moran, Tauras Liubinskas, and Jack Goral as inventors, and which was filed Jan. 10, 2002 under Ser. No. 10/043,501 and attorney docket number NAI1P050/02.003.01, and which is incorporated herein by reference in its entirety.
Claims
What is claimed is:
1. A method for expert application performance analysis, comprising: receiving a set of enabled applications; monitoring a network for traffic related to the enabled applications; filtering performance data relating to the enabled applications from the network traffic; categorizing the performance data into flows; prioritizing the flows; processing the flows based on the priority;` generating a set of metrics in real time based on the processed flows; measuring a performance of the applications from a: client perspective, a server perspective, and a network perspective using the metrics; and performing threshold-based actions based on the metrics; wherein the performance data is gathered for transaction-oriented transactions, stream-oriented transactions, and throughput-oriented transactions; wherein the metrics generated for the transaction-oriented transactions include a command time per transaction, a response time per transaction, an elapsed time from a start of a command to a start of a response, an elapsed time from a start of a command to an end of a response, and a number of failures; wherein the metrics generated for the stream-oriented transactions include a type of service expected during setup, a type of service actually received, a number of transactions, a number of successful transactions, and a ratio for an accumulated time of disrupted service over transaction time; wherein the metrics generated for the throughput-oriented transactions include a number of transactions, a number of successful transactions, throughput calculations per transaction, byte rate during the transaction, and response size; and identifying application subtypes within the application; wherein flows of the performance data are prioritized; wherein multiple applications are monitored; wherein each of the applications is monitored simultaneously when in a flat mode; wherein each of the applications is monitored sequentially when in a roving mode; wherein the sequential monitoring is based on an allotted amount of time.
2. A method as set forth in claim 1, wherein an application server module is included.
3. A method as set forth in claim 2, wherein the application server module includes a system controller, administrative functions, and a user interface.
4. A method as set forth in claim 1, wherein a gigabit Ethernet media module is included.
5. A method as set forth in claim 4, wherein the gigabit Ethernet media module includes an analysis engine, and expert applications.
6. A method as set forth in claim 1, wherein a probe enclosure is included.
7. A method as set forth in claim 6, wherein the probe enclosure houses an application server module and a media module.
8. A method as set forth in claim 1, wherein a shelf enclosure is included.
9. A method as set forth in claim 8, wherein the shelf enclosure houses an application server module and a plurality of media modules.
10. A system for expert application performance analysis capable of carrying out a method, the method comprising: receiving a set of enabled applications; monitoring a network for traffic related to the enabled applications; filtering performance data relating to the enabled applications from the network traffic; categorizing the performance data into flows; prioritizing the flows; processing the flows based on the priority; generating a set of metrics in real time based on the processed flows; measuring a performance of the applications from a: client perspective, a server perspective, and a network perspective using the metrics; and performing threshold-based actions based on the metrics; wherein the performance data is gathered for transaction-oriented transactions, stream-oriented transactions, and throughput-oriented transactions; wherein the metrics generated for the transaction-oriented transactions include a command time per transaction, a response time per transaction, an elapsed time from a start of a command to a start of a response, an elapsed time from a start of a command to an end of a response, and a number of failures; wherein the metrics generated for the stream-oriented transactions include a type of service expected during setup, a type of service actually received, a number of transactions, a number of successful transactions, and a ratio for an accumulated time of disrupted service over transaction time; wherein the metrics generated for the throughput-oriented transactions include a number of transactions, a number of successful transactions, throughput calculations per transaction, byte rate during the transaction, and response size; and identifying application subtypes within the application; wherein flows of the performance data are prioritized; wherein multiple applications are monitored; wherein each of the applications is monitored simultaneously when in a flat mode; wherein each of the applications is monitored sequentially when in a roving mode; wherein the sequential monitoring is based on an allotted amount of time.
11. A system as set forth in claim 10, wherein an application server module is included.
12. A system as set forth in claim 11, wherein the application server module includes a system controller, administrative functions, and a user interface.
13. A system as set forth in claim 10, wherein a gigabit Ethernet media module is included.
14. A system as set forth in claim 13, wherein the gigabit Ethernet media module includes an analysis engine, and expert applications.
15. A system as set forth in claim 10, wherein a probe enclosure is included.
16. A system as set forth in claim 15, wherein the probe enclosure houses an application server module and a media module.
17. A method as set forth in claim 10, wherein a shelf enclosure is included.
18. A method as set forth in claim 17, wherein the shelf enclosure houses an application server module and a plurality of media modules.
19. A computer program product embodied on a computer readable medium for expert application performance analysis capable of carrying out a method, the method comprising: receiving a set of enabled applications; monitoring a network for traffic related to the enabled applications; filtering performance data relating to the enabled applications from the network traffic; categorizing the performance data into flows; prioritizing the flows; processing the flows based on the priority; generating a set of metrics in real time based on the processed flows; measuring a performance of the applications from a: client perspective, a server perspective, and a network perspective using the metrics; and performing threshold-based actions based on the metrics; wherein the performance data is gathered for transaction-oriented transactions, stream-oriented transactions, and throughput-oriented transactions; wherein the metrics generated for the transaction-oriented transactions include a command time per transaction, a response time per transaction, an elapsed time from a start of a command to a start of a response, an elapsed time from a start of a command to an end of a response, and a number of failures; wherein the metrics generated for the stream-oriented transactions include a type of service expected during setup, a type of service actually received, a number of transactions, a number of successful transactions, and a ratio for an accumulated time of disrupted service over transaction time; wherein the metrics generated for the throughput-oriented transactions include a number of transactions, a number of successful transactions, throughput calculations per transaction, byte rate during the transaction, and response size; and identifying application subtypes within the application; wherein flows of the performance data are prioritized; wherein multiple applications are monitored; wherein each of the applications is monitored simultaneously when in a flat mode; wherein each of the applications is monitored sequentially when in a roving mode; wherein the sequential monitoring is based on an allotted amount of time.
Description
FIELD OF THE INVENTION
The present invention relates to network monitoring and management, and more particularly to expert services in a network application monitoring system.
BACKGROUND OF THE INVENTION
Networks are used to interconnect multiple devices, such as computing devices, and allow the communication of information between the various interconnected devices. Many organizations rely on networks to communicate information between different individuals, departments, work groups, and geographic locations. In many organizations, a network is an important resource that must operate efficiently. For example, networks are used to communicate electronic mail (e-mail), share information between individuals, and provide access to shared resources, such as printers, servers, and databases. A network failure or inefficient operation may significantly affect the ability of certain individuals or groups to perform their required functions.
A typical network contains multiple interconnected devices, including computers, servers, printers, and various other network communication devices such as routers, bridges, switches, and hubs. The multiple devices in a network are interconnected with multiple communication links that allow the various network devices to communicate with one another. If a particular network device or network communication link fails or underperforms, multiple devices, or the entire network, may be affected.
Network management is the process of managing the various network devices and network communication links to provide the necessary network services to the users of the network. Typical network management systems collect information regarding the operation and performance of the network and analyze the collected information to detect problems in the network. For example, a high network utilization or a high network response time may indicate that the network (or a particular device or link in the network) is approaching an overloaded condition. In an overloaded condition, network devices may be unable to communicate at a reasonable speed, thereby reducing the usefulness of the network. In this situation, it is important to identify the network problem and the source of the problem quickly and effectively such that the proper network operation can be restored.
Often applications running on the network are a source of the aforementioned problems or adversely affected by such problems. There is thus a continuing need for a new application-monitoring system for domestic enterprise management. Such a system should enable administrators (such as Network Managers) and service providers to introduce real-time application monitoring into service offerings. There is also a need to offer application monitoring since a large number of business and end users stand to gain significant understanding of their networks applications, performance and security.
SUMMARY OF THE INVENTION
A system, method, and computer program product are provided for expert application performance analysis. An application is monitored. Performance data is gathered during the monitoring. A set of metrics is generated based on the performance data. A performance of the application is measured from at least one of a client perspective, a server perspective, and a network perspective using the metrics.
In one embodiment, a set of enabled applications is received. A network is monitored for traffic related to the enabled applications. Performance data relating to the enabled applications is filtered from the network traffic and categorized into flows. The flows are prioritized, with low priority data going to a low-priority queue to reduce the packet arrival data to prevent dropping of packets. Note that this can also include giving each flow the same priority or no priority. The flows are processed based on the priority. A set of metrics is generated in real time based on the processed flows. A performance of the applications is measured using the metrics.
In an embodiment, performance data is gathered for transaction-oriented transactions, stream-oriented transactions, and/or throughput-oriented transactions. The metrics generated for the transaction-oriented transactions may include a command time per transaction, a response time per transaction, an elapsed time from a start of a command to a start of a response, an elapsed time from a start of a command to an end of a response, and/or a number of failures. The metrics generated for the stream-oriented transactions may include a type of service expected during setup, a type of service actually received, a number of transactions, a number of successful transactions, and/or a ratio for an accumulated time of disrupted service over transaction time. The metrics generated for the throughput-oriented transactions can include a number of transactions, a number of successful transactions, throughput calculations per transaction, byte rate during the transaction, and/or response size.
In another embodiment, an application content expert can be used to identify application subtypes within the application to identify tunneled applications and generate more precise metrics.
In a further embodiment, multiple applications are monitored. Each of the applications is monitored simultaneously when in a flat mode. Each of the applications is monitored sequentially when in a roving mode. As an option, the sequential monitoring can be based on an amount of time allotted to each flow and/or application.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a representation of a system architecture according to one embodiment.
FIG. 2 shows a representative hardware environment that may be associated with the workstations of FIG. 1, in accordance with one embodiment.
FIG. 3 illustrates an Application Monitor system according to one embodiment.
FIG. 4 is a diagram illustrating a system configuration for incorporating multiple nodes with centralized management.
FIG. 5 shows the basic hardware configuration of a Probe.
FIG. 6 shows the basic hardware configuration of the shelf system.
FIG. 7 depicts an illustrative CPCI module.
FIG. 8 depicts an HDD rear transition module (RTM).
FIG. 9A is a drawing of RTM usage in a multi-interface configuration.
FIG. 9B depicts RTM usage in a single-interface configuration.
FIG. 10 depicts CPCI bus transfer modes.
FIG. 11 shows an illustrative CPCI related hardware subclassification tree.
FIG. 12 depicts an operational environment including a node along with a set of environmental entities, which the node interacts with.
FIG. 13 is a table that listing a sub-classification of users.
FIG. 14 is a high-level diagram that shows basic components of application server hardware.
FIG. 15 shows the application server top-level subsystems and dependencies.
FIG. 16 shows the UI servers provided by the Application Server.
FIG. 17 shows the primary run-time flows between application server subsystems and UI servers.
FIG. 18 is a diagram showing a Multi-Interface (MI) Expert server and its related subsystems.
FIG. 19 depicts an RMON services subsystem and its primary flows.
FIG. 20 shows the primary flows associated with the logging manager.
FIG. 21 depicts several application server object repository packages.
FIG. 22A shows an example managed object containment view of a node as seen by the application server.
FIG. 22B depicts an example managed object containment view of a media module as seen by the application server.
FIG. 23 is a flow diagram of a process in which the configuration manager uses the compatibility objects as a rules base for managing version and capability relationships between the system and its modules (hardware and software).
FIG. 24 show some of the relationships between the registry services and other subsystems.
FIG. 25 depicts registry entry object associations.
FIG. 26 shows a collection of triggers and trigger groups.
FIG. 27 depicts the major subsystems of the media module and their dependencies.
FIG. 28 is a high-level diagram that shows basic components of the media module hardware and dependencies.
FIG. 29 shows a top-level view of a PMD subsystem.
FIG. 30 shows a top-level view of a capture subsystem.
FIG. 31 shows a top-level view of a shared memory subsystem.
FIG. 32 shows a top-level view of a focus subsystem.
FIG. 33 shows the media module top-level subsystems and dependencies.
FIG. 34 shows the main components of the media module expert subsystem.
FIG. 35 illustrates a top-level Media Module Expert component classification.
FIG. 36 shows an example sub-classification of application expert components and the relation to a few application protocols.
FIG. 37 depicts a process for expert application performance analysis according to one embodiment.
FIG. 38 illustrates RMON object dependencies and persistence levels.
FIG. 39 shows the pipelined (flow processing and expert processing) filter and buffer components provided by the media module.
FIG. 40 depicts a process for adaptive priority data filtering according to an embodiment.
FIG. 41 is a media module general processing flow.
FIG. 42 is a high-level media module packet processing sequence diagram.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Embodiments of the present invention identify electronic mail messages and other types of network communications that are suspected of being infected by malicious code, and quarantines such messages and communications having potentially malicious content. The identification of this potentially malicious content may be accomplished utilizing heuristics. Examples of such heuristics are provided below.
FIG. 1 illustrates a network architecture 100, in accordance with one embodiment. As shown, a plurality of remote networks 102 are provided including a first remote network 104 and a second remote network 106. Also included is at least one gateway 107 coupled between the remote networks 102 and a proximate network 108. In the context of the present network architecture 100, the networks 104, 106 may each take any form including, but not limited to a local area network (LAN), a wide area network (WAN) such as the Internet, etc.
In use, the gateway 107 serves as an entrance point from the remote networks 102 to the proximate network 108. As such, the gateway 107 may function as a router, which is capable of directing a given packet of data that arrives at the gateway
107, and a switch, which furnishes the actual path in and out of the gateway 107 for a given packet.
Further included is at least one data server 114 coupled to the proximate network 108, and which is accessible from the remote networks 102 via the gateway 107. It should be noted that the data server(s) 114 may include any type of computing device/groupware. Coupled to each data server 114 is a plurality of user devices 116. Such user devices 116 may include a desktop computer, lap-top computer, hand-held computer, printer or any other type of logic. It should be noted that a user device
117 may also be directly coupled to any of the networks, in one embodiment.
A monitoring system 120 is coupled to a network 108. Illustrative monitoring systems will be described in more detail below. It should be noted that additional monitoring systems and/or components thereof may be utilized with any type of network element coupled to the networks 104, 106, 108. In the context of the present description, a network element may refer to any component of a network.
FIG. 2 shows a representative hardware environment associated with a user device 116 of FIG. 1, in accordance with one embodiment. Such figure illustrates a typical hardware configuration of a workstation having a central processing unit 210, such as a microprocessor, and a number of other units interconnected via a system bus 212.
The workstation shown in FIG. 2 includes a Random Access Memory (RAM) 214, Read Only Memory (ROM) 216, an I/O adapter 218 for connecting peripheral devices such as disk storage units 220 to the bus 212, a user interface adapter 222 for connecting a keyboard 224, a mouse 226, a speaker 228, a microphone 232, and/or other user interface devices such as a touch screen and a digital camera (not shown) to the bus 212, communication adapter 234 for connecting the workstation to a communication network
235 (e.g., a data processing network) and a display adapter 236 for connecting the bus 212 to a display device 238.
The workstation may have resident thereon an operating system such as the Microsoft Windows.RTM. NT or Windows.RTM. 2000 Operating System (OS), the IBM OS/2 operating system, the MAC OS, or UNIX operating system. It will be appreciated that a preferred embodiment may also be implemented on platforms and operating systems other than those mentioned. A preferred embodiment may be written using JAVA, C, and/or C++ language, or other programming languages, along with an object oriented programming methodology. Object oriented programming (OOP) has become increasingly used to develop complex applications.
The following sections provide a high-level description of an architecture of a system for monitoring and managing a network according to an embodiment. The system includes a set of application monitoring and management tools that provide business critical application and network performance information to administrators such as CIOs and enterprise network managers.
The new application-monitoring system is provided for domestic enterprise management. One purpose of this system is to enable administrators (such as CIOs and Network Managers) to introduce real-time application monitoring into service offerings. There is a need to offer application monitoring since a large number of business and end users stand to gain significant understanding of their networks applications, performance and security.
One embodiment provides distributed multi-segment network monitoring and correlation, with a focus on application performance. This multi-segment capability can be extended to multi-site monitoring and correlation (e.g. nodes placed at different geographical locations). The system is preferably based on a scalable, high-performance, open architecture, which can be easily adapted to support many different topologies and features.
Topologies
FIG. 3 illustrates an Application Monitor system 300 according to one embodiment. As shown, the system can include the following topologies:
1. Single-interface probe 302
2. Multi-interface (shelf-based) system 304
In any topology, the system includes two major components: a single application server module and one or more Media Modules. The role of the media module is to provide a physical observation point of network traffic on a given segment 306. The application server provides all administrative functions (i.e. user interface, provisioning, reports, alarms and statistics, Simple Network Management Protocol (SNMP) agent, etc.) for the system. In the single-interface configuration, a single monitoring interface is available in a self-contained, managed device, similar to a typical Remote Network Monitoring (RMON) probe.
In the multi-interface configuration, a larger system is possible by providing multiple interfaces (Media Modules), which allows monitoring and real-time correlation of multiple (co-located) network segments 308. Preferably, in both arrangements, no higher-layer management console is required. This second configuration also allows the mixing and matching of different media module types. One exemplary benefit of this configuration would be to monitor traffic seen on the WAN-side of a router, on a backbone, and on individual branch segments all from the same system, providing a complete network view from a single administrative point.
Administrative Domains
As mentioned in the previous section, the system is a self-managed device, meaning that no additional EMS/NMS functionality is required for any of the supported features. In use, a user can connect directly to the node using any standard web browser and immediately receive alarms, statistics and diagnosis, configure triggers, view reports, etc.
In a multi-location topology, however, a network manager may desire to incorporate multiple, physically separate nodes (shelf 304 or probe 302) under one management umbrella. FIG. 4 is a diagram illustrating a system configuration 400 for incorporating multiple nodes with centralized management. As shown, this may be accomplished using one of the following approaches:
1. Using Simple Network Management Protocol (SNMP) from a central management console 402
2. Using application server software 404 running on a workstation
Again, a user can connect directly to the node using any standard web browser 406. The second approach offers many benefits over a standard SNMP manager including enhanced correlation, multi-interface "Expert" functions, self-similar topology views, a rich set of triggers, system auto-discovery, etc.
Illustrative Features
The Application Monitoring system is a high performance, scalable monitoring and analysis tool using custom, purpose-built hardware. Furthermore, the system provides advanced network and application performance monitoring capability to enterprise network managers and CIOs.
Table 1 lists some exemplary features.
TABLE 1 Robust 24 .times. 7 "always-on" network and application monitoring High performance Compact PCI based architecture Single or multiple (co-located) interfaces in common chassis Full gigabit line rate statistics and capture Real-time deep packet flow classification and filtering per interface RMON 1, 2 and 3 (APM) functionality per interface Real-time Expert monitoring and alarms Multi-interface (correlated) RMON and Expert statistics and alarms Integrated network management and web-based user interface functionality Flexible application customization via trigger scripts Capability to mix and match multiple interfaces and interface types in same shelf Completely field upgradeable (remote download and configuration) Secure multi-client, multi-privilege-level end user authentication
Applications
The system platform can support a multitude of monitoring and analysis applications due to its open architecture and inherent flow classification capabilities. Table 2 is a partial list of applications provided by the system. These include real-time application monitoring and diagnostic services
TABLE 2 Performance and SLA management - Application and network response time, distributions, etc. RMON1, 2 and 3 (Application Performance Monitoring) Security management - IDS, Theft Of Service, DOS, DDOS, etc. Policy management - Access violations, illegal content, bandwidth over-use, etc. Network engineering - Reports showing where to increase capacity, add routers, etc. Accounting - Bill-back by application usage, department, lost revenue, etc. Quality of Service (QOS) management Report generation and logging Fault isolation and troubleshooting Application performance monitoring (single and multi-interface) Application distribution statistics (by user, domain, VLAN, server, interface, etc.) RMON1, 2 and 3 (APM) capabilities via SNMP agent Flow classification for tracking applications between endpoints (servers, hosts, groups) Observed QOS and SLA metrics Security monitoring and alerts Generation of alarms and traps on any user selected criteria Diagnostic information for detected anomalies Fault isolation (when used in multi-site configurations) Multi-user, multi session web-based user interface User-customizable applications via trigger scripts
Extensibility
Again, given the open architecture, the system according to one embodiment is extensible in the areas shown in Table 3.
TABLE 3 New or enhanced applications via software download New or higher performance media modules Addition of new hardware feature modules (GPS, etc.) Custom applications via trigger scripting
System Hardware Components
A system hardware architecture according to a preferred embodiment is described below. The system hardware architecture in this example is based on the Compact PCI (CPCI) multi-processor computer platform. The configurations can use a chassis, power supplies and system controller (single board computer) module. Hardware modules can be developed per physical media type (i.e. ATM, Gigabit Ethernet, etc.) but all share a common design above the media-dependent portion. Note that the description of this preferred embodiment is presented by way of example only and one skilled in the art will appreciate that variations may be made to the various embodiments without straying from the spirit and scope of the present invention.
Illustrative components included in the system are listed in the Table 4.
TABLE 4 Application Server Module - system controller, administrative functions and user interface Gigabit Ethernet Media Module - analysis engine, physical line I/Fs, RMON and Expert applications Probe Enclosure - small 2U CPCI chassis, houses one Application Server and one Media Module Shelf Enclosure - 16 slot CPCI chassis, houses one Application Server and several Media Modules
The system can include the following Compact PCI compliant components, for example:
Backplanes:
1. The 2U backplane supports 64-bit or 32 bit bus transfers at 66 or 33 MHz
2. The multi-slot backplane supports 64-bit or 32 bit bus transfers at 33 MHz
Primary Hardware modules (6U CPCI Cards):
1. A single "Application Server" module--CPCI single board computer
2. One or more "Media Modules"--analysis engine and monitoring interface
3. CPCI option boards--GPS timing module, RAID interface, etc. as needed
Additional Modules:
1. Rear Transition Module (RTM) HDD board--provides hard drive, serial port and Ethernet for any primary hardware module. Note that this module is always required for the application server and is optional for media modules (in multi-slot configurations).
2. PMC (daughter-card) option modules for application server
The Compact PCI specification allows the use of multiple bus masters in a system and includes support for the items shown in Table 5.
TABLE 5 Plug and Play detection of hardware and auto configuration of memory and interrupts Transfer rates of 66+ MHz at 64+ bits (e.g. 4.2 Gb/S) Multi-master arbitration for shared resources (targets) Burst DMA to/from any target by any master Dual-mode (target/initiator) operation for transparent agents
FIG. 5 shows the basic hardware configuration of a Probe 302. Various combinations are possible for the two configurations; however in general the stand-alone probe can use a 2U pizza-box chassis 502 populated with a single media module 504 and application server Module 506.
FIG. 6 shows the basic hardware configuration of the shelf system 304. The shelf system can use a 16-slot chassis 602 populated with a single application server Module 604 and one or more Media Modules 606. It should be noted that the application server and media module designs are reusable in any CPCI enclosure.
CPCI Modules
FIG. 7 depicts an illustrative Compact PCI (CPCI) module 700. All hardware modules can conform to the to PICMG 2.0 R3.0 Compact PCI Core Specification, which defines a shared 32 or 64-bit data transfer path running at 33 or 66 MHz, a set of standard board profiles, an optional rear transition module (rear I/O) per slot, and one or more optional PMC (mezzanine) daughter cards per standard board.
The standard board sizes can be based on a Euro-card format and are typically available in two primary sizes, as listed in the following table.
TABLE 6 3U profile - 116.675 mm by 160 mm 6U profile - 233.35 mm by 160 mm (type used in system)
In addition, these boards have a height profile, which dictates how many backplane slots they occupy. The common single-slot profile is referred to as "4HP". Boards may be of this unit height or multiples of it such as 8HP (double-slot), 12HP (triple-slot), etc.
Application Server Module
The application server module according to an illustrative embodiment a 6U, 4HP (single-slot) CPCI single-board computer (SBC) module which acts as the CPCI system controller in any configuration. The role of the system controller is generally to configure any peripheral modules via plug-and-play auto detection. This includes assignment of memory address ranges, identifying bus number, slot number, hot-swap and bus-master capabilities, etc. All CPCI backplanes have at least one designated "system-slot" where the system controller resides. The application server therefore is responsible for detecting, configuring, managing and downloading software to all media modules in a given system. The following table lists some of the application server hardware attributes.
TABLE 7 SBC conforming to PICMG 2.0 R3.0 Compact PCI Core Specification Conforms to PICMG 2.1 R2.0 Compact PCI Hot Swap Specification Supports requirements for the Compact PCI system slot controller Supports 32-bit, 33 MHz PCI-to-PCI bridge operation Supports 64-bit, 33 MHz PCI-to-PCI bridge operation Supports 64-bit, 66 MHz PCI-to-PCI bridge operation Supports the 6U Euro-card 4HP single slot size (233.35 mm by 160 mm) format Uses the Intel Pentium 3 processor (850 MHz) Supports removable SODIMM memory in the following configurations. 128 Mbytes 256 Mbytes 512 Mbytes 1 Gbyte Supports the Compact PCI Compact-Flash IDE interface Contains two PMC expansion sites Supports remote Ethernet booting Contains one 10/100
Ethernet interface through the front bezel faceplate Supports an additional 10/100 Ethernet port through the RTM interface Contains SVGA interface through the front bezel faceplate Contains a keyboard interface through the front bezel faceplate Contains a mouse interface through the front bezel faceplate Contains a serial port interface through the RTM interface Supports an IDE HDD mini-drive through the RTM interface Contains a system reset button through the front Bezel faceplate Supports the RedHat Linux version operating system
Media Module
The media module, according to an illustrative embodiment, is a 6U, 8HP (double-slot) CPCI custom hardware module which acts as the network analysis interface in any system configuration. The role of the media module is generally to monitor a physical network segment, perform various levels of real-time analysis and to report events and statistics to the application server Module via the CPCI backplane. In addition, the media module supports plug-and-play auto detection, assignment of memory address ranges, reporting bus number, slot number, hot-swap and bus-master capabilities, etc. Table 8 lists some of the media module hardware attributes.
TABLE 8 Module conforming to PICMG 2.0 R3.0 Compact PCI Core Specification Conforms to PICMG 2.1 R2.0 Compact PCI Hot Swap Specification Supports requirements for a Compact PCI peripheral slot controller Supports 32-bit, 33 MHz PCI-to-PCI transparent bridge operation Supports 64-bit, 33 MHz PCI-to-PCI transparent bridge operation Supports 64-bit, 66 MHz PCI-to-PCI transparent bridge operation Supports the 6U Euro-card 8HP double slot size (233.35 mm by 160 mm) format Provides a PowerPC main board processor (850 MHz) Provides an additional analysis processor (850 MHz) Supports 1 Gbyte of 64-bit SDRAM capture memory Supports 1 Gbyte of 64-bit SDRAM main processor memory Supports 1 Gbyte of 64-bit SDRAM analysis processor memory Provides hardware accelerated primary packet filtering and DMA Provides hardware accelerated secondary packet filtering and DMA Provides shared memory interface between two on- board processors Provides hardware triggering functions Contains one 10/100 Ethernet interface through the front bezel faceplate Supports a serial port through the RTM interface Supports an IDE HDD mini-drive through the RTM interface Supports the VxWorks real-time operating system
Rear Transition Modules
FIG. 8 depicts an HDD Rear Transition Module (RTM) 800. The system architecture supports a single RTM for each primary board in the system (i.e. application server or Media Module). The RTM is an ancillary module which provides the functions set forth in Table 9.
TABLE 9 On-board 2.5" (HDD) 802 for the primary module Auxiliary 10/100 Ethernet interface 804 for the primary module Auxiliary serial port interface 806 for the primary module
The RTM module 800 may be required for the application server module in some systems, and is optional for each media module in a multi-interface system. FIG. 9A is a drawing of RTM usage in a multi-interface configuration 900. In multi-interface configurations, an RTM 800 may provide each media module 902 with the ability to perform autonomous capture and statistics logging to disk and enables multi-segment post capture analysis without requiring disk sharing.
FIG. 9B depicts RTM 800 usage in a single-interface configuration 920. In a single-interface (probe) configuration, streaming to the Application Server's RTM disk via the backplane may be adequate for this purpose.
PMC Modules
The application server supports multiple general-purpose PMC (daughter-card) modules with connector access through the front bezel.
System Connectors
All primary connectors can be provided via the front bezel of the system boards. The auxiliary connectors (ETH and COM) can also be provided on the RTM modules.
CPCI Bus Usage Model
FIG. 10 depicts CPCI bus transfer modes. The general transfer model taken for the system architecture is to utilize the CPCI backplane 1000 primarily for configuration, statistics, events and post capture (disk) transfers between the Media Module(s) 1002 and the Application Server 1004. The bulk processing of packet data is handled directly by the Media Module 1002, whereby the application server 1004 is essentially responsible for providing statistics and correlated data to the end user or management station. This approach improves performance and scalability.
One exception to this case is if high-speed streaming to disk (RAID) is required, whereby a fiber-channel transceiver module may be placed in the chassis and performs full-rate transfers from a media module 1002 to an off-shelf striped disk array. Other exceptions may arise, such as incorporation of a system SBC, and are not precluded.
Given the high-speed capacity of the CPCI bus (132 Mbytes/S in the slowest configuration), most transfers between the application server 1004 and media modules 1002 can use an "P over PCI" driver mechanism 1006, allowing a flexible and scalable communications approach. This model still provides approximately 40 Mbytes/S capacity, but greatly extends the system functionality and addressing capability. A "raw-mode" transfer capability 1008 can also be supported for block transfers requiring more speed.
The method used for moving data between the media modules 1002 and application server 1004 can be based on a "pull" model, whereby higher-level entities retrieve data (i.e. statistics and data objects) from the lower-level entities. The lower-level objects are maintained by the media modules 1002 "in-place". Therefore all requests for media module generated objects (from a user or management station) result in the application server 1004 retrieving data directly from the media module(s) 1002 of interest.
Events however are sent upward asynchronously to notify the higher-level entity of data availability, alarms, etc. This prevents a number of media modules from overloading the application server and scales at the system management level as well. This model is applied at the application server to client level as well and is consistent with the SNMP management environment.
Functional Architecture
Whereas the previous section provided an overview of the physical components of an illustrative system architecture, this section will focus on a functional decomposition of the system. This first-level decomposition will include both hardware and software subsystems as functional entities.
Methodology
The system architecture may be open and extensible at every level. To this end, an object-oriented approach has been used in decomposing the system into sets of self-contained subsystems with common interfaces. These subsystems may be overloaded with different components of the same "class" to extend functionality over time without creating additional complexity. This approach applies not only to specific hardware and software components, but also to combined functional entities as a whole. Each of these entities may be viewed as an encapsulated subsystem comprised of hardware, software, or both which provides a particular class of functionality within the system. Many of the diagrams referred to herein assume some level of understanding of the UML (Unified Modeling Language) by the reader. UML is a standard notation for the modeling of real-world objects as a first step in developing an object-oriented design methodology.
FIG. 11 shows an illustrative CPCI related hardware subclassification tree 1100. The subclassification example while quite simple, illustrates the potential overloading of media modules and CPCI enclosures within the system.
System Operational Environment
The operational environment generally includes the elements listed in Table 10.
TABLE 10 The network under observation The set of equipment the system interacts with The set of human clients who will interact with the system
FIG. 12 depicts an operational environment 1200 including a node 1202 along with a set of environmental entities, which it interacts with. These environmental entities will be described in the next subsections.
Observed Network 1204
The network 1204 under observation may include one or more network segments, which may or may not have a logical relationship to one another. Some examples of segments with relation to one another are listed in Table 11.
TABLE 11 Individual physical members of a logical trunk group (e.g. EtherChannel, IMA, etc.) Redundant or multi-homed backbones Segments on two sides of a switch (i.e. an aggregation relationship) Segments on two sides of a router carrying the same traffic (i.e. flow path related) Etc . . .
Segments without relation to one another include those listed in Table 12.
TABLE 12 Isolated backbone segments Links connected to isolated routers and switches (islands) Etc . . .
All observed network segments can be monitored via connections with one or more media interfaces, which are in turn realized by media modules in the system.
Environmental Equipment
Environmental equipment that the system can interact with includes three main classes:
1. Supporting equipment
2. Machine clients (i.e. network management systems)
3. Other servers (i.e. RMON probes)
Supporting equipment includes any external equipment that adds feature capability to the node itself in its monitoring role. In FIG. 12, the Modem 1206 and RAID array 1208 are considered to be of this supporting class. Many other types of supporting equipment may be interfaced to through CPCI option boards, PMC modules, or auxiliary interfaces.
Machine clients however, play a different role in that they have direct access to the managed objects of the system. Because of this, they can affect the behavior and state of the node and may be treated with the same security precautions as a human client. Machine clients supported by the node include SNMP managers and CORBA managers.
The application server itself may act as a higher-layer manager to a group of elements, which may be remotely located. In this case, the application server software may be running on a dedicated management workstation and uses CORBA as a direct object-level access protocol. Another example of a CORBA client would be a second level OSI NMS. The ODMG and other bodies have standardized on CORBA as the management interface above the element (EMS) level. The third class of equipment includes RMON probes.
Human Clients
Human clients fall generally into two categories:
1. Those clients who are directly connected to the node via a web browser
2. Those clients who are indirectly connected to a node via an intermediate manager
For clients in the first category, the node provides authentication and access to resources based on user privileges and provisioned policies. For the second type of users (indirect), the intermediate management system provides the majority of authentication and policy enforcement. In this case, the node treats the management machine as a "trusted" user and only enforces provisioned blanket policies for the machine. It should be noted that there may be situations where the node may be required to support both human and machine clients simultaneously. This type of situation is not precluded in the architecture.
In addition to these user categories, another sub-classification of users may be required based on how the client uses the node. For the present discussion, this sub-classification pertains to users from the first category (i.e. direct human clients). The sub-classification of these users can be based on the operations each class of user is interested in or allowed to perform. FIG. 13 is a table 1300 that lists these classes.
Application Server Module
The application server Module is the single point of user or management interaction with the monitoring node. In addition the application server Module acts as the CPCI "system controller" in any configuration, as such it resides in the system slot of a CPCI chassis.
The hardware for this module can be a Pentium 4 based single board computer running Linux, for example. Table 13 lists some of the features of this module.
TABLE 13 Multi-user, multi-session active web client interface Enterprise Java Beans based UI servlets Three-level RMON agent/proxy agent/manager functionality Multi-interface RMON and Expert correlation capability Object database for all configuration, event, statistics, alarm, expert, RMON and management objects Extensible CORBA based communications between all subsystems Client registry stores per-user session information including triggers, etc. Multi-level privilege policies provided by security manager Hardware auto-discovery, version checking and auto- configuration Per-user logging of alarms, events, statistics and reports Dedicated Ethernet management interface Dedicated serial port with command-line interface for administrative and remote dial-up functions Auxiliary Ethernet interface for non-service affecting maintenance functions (backup, etc.)
The application server is generally responsible for the functions listed in Table 14.
TABLE 14 Acting as the system controller in a CPCI backplane Performing hardware detection, configuration and version management for Media Modules Retrieving information from media modules for presentation to clients Handling and dispatching events (alarms, traps, trigger events) from media modules Providing a command line interface for initial system configuration and maintenance Providing all direct (web) user interface functionality via HTTP/JAVA Providing the primary management interface to machine clients (i.e. SNMP, CORBA, etc) Providing system and application configuration interface to all human and machine clients Detecting and reporting system faults (i.e. failed modules, etc.) User session management (security, authentication, privileges, event registry, etc.) Maintenance and upgrade functions (SW download adding new features/hardware, etc.) Providing graphs, reports, topology maps, alarms and statistics to end users Providing application customization via installable triggers Providing correlated events and statistics across multiple interfaces (Media Modules) Providing RMON functionality as a proxy agent for multiple sub-agents (Media Modules) Providing RMON functionality as a correlation agent for multiple sub-agents (Media Modules)
Hardware Description
As mentioned in a previous section, the application server software can rely on a CPCI single board computer board running Linux. This board is essentially a high-powered workstation on a CPCI module. FIG. 14 is a high-level diagram that shows the basic components 1400-1410 of the application server hardware. Illustrative components are briefly described in Table 15.
TABLE 15 PMC Peripherals 1400 - daughter-cards, I/O through front bezel Front bezel interfaces 1402 - Standard I/O (mouse, keyboard, SVGA, 10/100 Ethernet) AS Processor 1404 - e.g., Pentium 3, 850 MHz Intel processor Main Memory 1406 -
1Gbyte SODIMM DRAM Flash Disk 1408 - 128 Mbyte, on-board, non-volatile storage AS CPCI Interface 1410 - CPCI system controller bridge Rear Transition Module Interfaces 1412 - 40 Gbyte mini hard-drive, serial and second Ethernet
Software Description
This section will describe an illustrative software subsystems and interfaces which can comprise the application server module. A top-down approach will be used to introduce the overall architecture and each of the constituent subsystems. This architecture should be viewed as a basic model, which can be changed as more focused resources are added to the system.
FIG. 15 shows the application server top-level subsystems and dependencies. In FIG. 15, a set of top-level packages, representing major architectural components are shown. In the following subsections, each will be described and further decomposed into additional subsystems with their descriptions. Preferably, the architecture is very centered around the common object repository 1504 (and configuration manager 1506). This repository is preferably an active object database, which supports event generation when certain operations are performed on (or attributes change in) active objects. As will be seen, this portion of the architecture is used to support inter-subsystem communications and triggering functions.
A set of common engines 1508 for supporting user interface functions (i.e. logging, statistics, alarm and event managers) is also shown in FIG. 15. These engines each provide a consolidated point for sending common types of information from various sources to the UI servers 1510.
Also shown in FIG. 15 is another set of related subsystems 1511, which handle user session management including security, registering for services, and setting up triggers. A set of subsystems 1512 provide analysis, monitoring and administrative services either directly to clients (i.e. RMON) or through the UI servers. Also shown is the hardware services subsystem 1514, which provides all access to hardware objects (Media Module), including events, configuration, statistics, and maintenance functions. Note that throughout this section it is assumed that inter-subsystem object access is provided through the object repository (via CORBA) and events are passed between subsystems using CORBA.
UI Servers
FIG. 16 shows the UI servers 1510 provided by the Application Server. The UI servers are responsible for providing web clients various UI elements for configuring the system or a session, creating triggers, creating and viewing reports, graphs and logs, viewing alarms, statistics and events, and performing maintenance or administrative functions.
There are two basic user interface presentation classes:
1. Web based UI
2. Serial configuration and administrative UI (command-line interface)
The web-based interface can rely on an Enterprise Java Beans (EJB) framework and can provide dynamic HTML generation via Java Server Pages (JSP) for passive clients. Optionally, the framework can support connections with active clients for providing an event interface and enhanced functionality. In the second case, clients may retrieve active applets (or beans) from the Application Server, which may use Java remote method invocation (RMI) to support real-time event notification and direct operations on the server. In addition, this mechanism allows a greater level of scalability by leveraging the power of the client machine for distributed graphics generation and logging, etc.
The serial UI is essentially a terminal (command-line) interface for administrative and maintenance functions such as setting the IP addresses of the node, running system diagnostics, etc. It should be noted that many of the administrative functions are available through the web interface as well.
FIG. 17 shows the primary run-time flows between application server subsystems and UI servers 1510.
The graphical UI components of FIG. 16 are briefly described in the following subsections.
Log Server 1602
The log server is the element that provides access to log files on a per user basis. Log files provide a time-stamped persistence mechanism for transient data and events. Logs may be created as user specific or as system global. The system global logs may be stored on the application server module, whereas user specific logs can reside on the application server or on the client machine (assuming an active client). The log server provides operations for creating, deleting, enabling and disabling each log. Per-user logs are created by adding alarms, triggers, statistics and events as "logged" in the user's registry entry. Global logs are created by adding alarms, triggers, statistics and events as "logged" in the SYSTEM registry entry. Once a log is created, it is accessible via the log server screens. The logging manager subsystem provides the actual functions for creating and adding entries to logs and dispatching information to the log server.
Graph Server 1604
The graph server is the element that provides access to various graphs on a per user basis. Graphs provide a useful mechanism for viewing of multi-dimensional data. Graphs may be generated based on user specified or system global data and events. The graph server provides operations for creating, deleting, enabling and disabling each graph view. Per-user graphs are created via the user's registry entry. Global graphs are created via the SYSTEM registry entry. The graph server additionally provides functions for creating and adding entries to graphs along with the graph type and criteria. Graphs may be generated using dynamic data or data from log files. In general the graph server receives data from the subsystems listed in Table 16.
TABLE 16 MI Expert Server RMON Services Logging Manager Statistics Manager Alarm Manager Event Manager
Report Server 1606
The report server like the graph server provides access to report files on a per user basis. Reports may be generated based on user specified or system global data and events. The report server provides operations for creating, deleting, enabling and disabling each report view. The report server additionally provides functions for creating and adding entries to reports along with the report type and criteria. Per-user reports are created via the user's registry entry. Global reports are created via the SYSTEM registry entry. Reports may be generated using dynamic data or data from log files. In general the report server receives data from the subsystems set forth in Table 17.
TABLE 17 MI Expert Server RMON Services Logging Manager Statistics Manager Alarm Manager Event Manager
Statistics Server 1608
The statistics server is the element that provides access to groups of statistics on a per user basis. Statistics groups may be created as user specific or as system global. The system global statistics can be stored on the application server module, whereas user specific statistics can reside on the application server or on the client machine (assuming an active client). The statistics server provides operations for creating, deleting, enabling and disabling statistics groups. Adding statistics in the user's registry entry creates per-user groups. Adding statistics in the SYSTEM registry entry creates global groups. Once a statistics group is created, it is accessible via the statistics server screens. The statistics manager subsystem provides the actual functions for creating and adding entries to statistics groups and dispatching information to the statistics server.
Event Server 1610
The event server, like the statistics server provides access to groups of events on a per user basis. Event groups may be created as user specific or as system global. The system global events may be stored on the application server module, whereas user specific events can reside on the application server or on the client machine (assuming an active client). The event server provides operations for creating, deleting, enabling and disabling event groups. Adding events in the user's registry entry creates per-user groups. Adding events in the SYSTEM registry entry creates global groups. Once an events group is created, it is accessible via the event server screens. The event manager subsystem provides the actual functions for creating and adding entries to event groups and dispatching information to the event server.
Configuration Server 1612
The configuration server provides access to system configuration functions and information. Table 18 lists some of the types of configuration information available.
TABLE 18 Supported hardware and software versions, compatibility rules and default settings Current hardware and software modules, types, versions, capabilities and status Supported RMON functions and their status (enabled, etc.) Supported Expert functions and their status (enabled, etc.) Supported Administrative functions and their status (enabled, etc.) User session information Security and user policy information User registry information System and user triggers and their status (enabled, etc.) Logging capabilities and their status (enabled, etc.) Statistics capabilities and their status (enabled, etc.) Alarm capabilities and their status (enabled, etc.) Event capabilities and their status (enabled, etc.)
The configuration server relies primarily on the configuration manager for accessing system information, but also depends on administrative services and the session manager for controlling access to privileged configuration operations.
Triggers Server 1614
The triggers server is the element that provides access to triggers on a per user basis. Triggers may be created as user specific or as system global. The triggers server provides operations for creating, deleting, modifying, enabling and disabling triggers. The triggers server presents the system events and actions available to triggering functions. Adding triggers to the user's registry entry creates per-user triggers. Adding triggers in the SYSTEM registry entry creates global triggers. Once a trigger is created, it is accessible via the triggers server screens. The triggers manager subsystem provides the actual functions for creating and adding triggers and exchanges events and actions with other subsystems and the object database.
Alarms Server 1616
The alarms server, like the event and statistics servers, provides access to groups of alarms on a per user basis. Alarm groups may be created as user specific or as system global. The system global alarms may be stored on the application server module, whereas user specific alarms can reside on the application server or on the client machine (assuming an active client). The alarms server provides operations for creating, deleting, enabling and disabling alarm groups. Adding alarms in the user's registry entry creates per-user groups. Adding alarms in the SYSTEM registry entry creates global groups. Once an alarm group is created, it is accessible via the alarms server screens. The alarms manager subsystem provides the actual functions for creating and adding entries to alarm groups and dispatching information to the alarms server.
Decode Server 1618
The decode server provides various views of captured packets in a human readable format. The decode server receives data from the capture manager subsystem.
Administrative Server 1620
The administrative server provides a system administrator with a set of functions for provisioning, maintaining and managing the system. Access to these services is typically restricted from all users except those with administrative privileges. The administrative services subsystem provides the actual functions for administering the system and provides an interface to the administrative server (and the administrative serial UI server). Table 19 lists some of the operations available via the administrative server.
TABLE 19 General system setup and configuration Access to the SYSTEM entry in the registry Software download functions Backup and restore functions Adding and removing hardware modules Maintenance functions Etc.
MI Expert Server 1702 (See FIG. 17)
FIG. 18 is a diagram showing the MI Expert server 1702 and its related subsystems. The MI expert server subsystem is responsible for creating, deleting, enabling and disabling expert monitoring and analysis functions on the application server. There are two basic modes of operation provided by the expert server:
1. Proxy expert mode
2. Multi-interface (MI) expert mode
In the proxy mode (much like the RMON proxy module), the expert server relays expert objects, alarms, statistics and events from media modules to one or more of the UI servers or supporting engines. In MI mode, the expert server collects expert objects, alarms, statistics and events from multiple media modules to perform correlation across multiple interfaces based on rules sets. This second mode may also be used to provide information to the application server RMON agent for correlation MIBs. Additionally, when in MI mode the expert server may request media modules to capture packet data to disk, which may be used to further correlate information across multiple interfaces. It should be noted that both modes could be in operation simultaneously.
RMON Services 1704 (See FIG. 17)
FIG. 19 depicts an RMON services subsystem 1704 and its primary flows. The RMON services subsystem is responsible for providing access to local MIB objects for external SNMP management systems as well as internal UI servers. There are three basic subsystems provided by the RMON services on the Application Server:
1. Proxy (bridge) module 1902
2. Multi-interface (MI) agent module 1904
3. Manager module 1906
The proxy module (much like the expert proxy mode) relays SNMP objects alarms, statistics and events from agents on media modules and the MI agent to external SNMP managers, as well as to the local manager module.
The MI agent module provides correlation across multiple interfaces based on rules sets. This second module may use information generated by the MI expert to generate the correlation MIBs, which are available to external managers as well as to the local manager module.
The manager module collects information from the MI agent and the media module agents (and potentially external agents) for presentation to a direct (web) user. The manager module may rely on local engines (logging manager, statistics manager, event manager, alarm manager and capture manager) and the UI servers to provide RMON management views to users.
Note that this is but one illustrative architecture.
Administrative Services
The administrative services subsystem is responsible for providing administrative functions to a (direct) client with administrative privileges. Two user interface servers have access to the services provided by this subsystem:
1. Administrative Serial UI (CLI based)
2. Administrative Server (web based)
In addition, triggers may be configured to perform a subset of administrative functions based on system events, time of day, etc.
The functions listed in Table 20 below are available via the administrative services subsystem.
TABLE 20 Access to the SYSTEM registry entry System and individual module reset functions System and module initialization and self-test functions Hardware installation and maintenance procedures IP address provisioning User login and authentication provisioning Machine client login and authentication provisioning User privilege levels and policy administration System backup and restore functions Software download functions Type, version and compatibility verification for all hardware and software modules System status reports
Logging Manager 1706 (See FIG. 17)
FIG. 20 shows the primary flows associated with the logging manager 1706. The logging manager subsystem is responsible for creating and storing system and user logs, which include time-stamped events, alarms, statistics, and other information as requested on a per session basis. In addition, the logging manager provides the requested log information to the log server UI element based on logging criteria in the user and SYSTEM registry entries. The logging manager uses the application server hard drive to persist this data and may additionally use secondary storage (i.e. a file server) for extended capability. It should be noted that equivalent functionality may be provided on each media module when equipped with a local hard drive. In this case, the logging manager on the application server treats each logging manager on the media modules as a remote file server.
Statistics Manager 1708 (See FIG. 17)
The statistics manager 1708 is a common shared resource for all application engines (i.e. RMON, Expert, etc.) on the application server and equivalent functions on the media modules. This subsystem is used to provide (dispatch) statistics to the statistics server, graph server and report server UI elements, as well as to the logging manager. The various statistics may be dispatched based on intervals, change occurrence, etc. as defined in the user and SYSTEM registry entries. This subsystem provides dispatch filtering on a per user basis for multiple client sessions. System triggers may be provided by this subsystem to invoke actions based on statistics. The actual statistics objects are maintained in the object repository.
Alarm Manager 1710 (See FIG. 17)
The alarm manager 1710 is a common shared resource for all application engines (i.e. RMON, Expert, etc.) on the application server and equivalent functions on the media modules. This subsystem is used to provide (dispatch) alarms to the alarms server, graph server and report server UI elements, as well as to the logging manager. The various alarms may be dispatched based on severity, intervals, change occurrence, etc. as defined in the user and SYSTEM registry entries. This subsystem provides dispatch filtering on a per user basis for multiple client sessions. System triggers may be provided by this subsystem to invoke actions based on alarms (i.e. dial a pager, etc.). The actual alarm objects are maintained in the object repository.
Event Manager 1712 (See FIG. 17)
The event manager 1712, like the alarm manager 1710 is a common shared resource for all application engines (i.e. RMON, Expert, etc.) on the application server and equivalent functions on the media modules. This subsystem is used to provide (dispatch) alarms to the events server, graph server and report server UI elements, as well as to the logging manager. The various events may be dispatched based on severity, intervals, change occurrence, etc. as defined in the user and SYSTEM registry entries. This subsystem provides dispatch filtering on a per user basis for multiple client sessions. System triggers may be provided by this subsystem to invoke actions based on events.
Capture Manager
The capture manager subsystem, like the logging manager is responsible for creating and storing trace files, which include filtered packets as requested on a per session basis. In addition, the capture manager provides the requested information to various clients including the decode server UI element, based on capture criteria in the user and SYSTEM registry entries. The capture manager uses the application server hard drive to persist this data and may additionally use secondary storage (i.e. a file server) for extended capability. It should be noted that equivalent functionality may be provided on each media module when equipped with a local hard drive. In this case, the capture manager on the application server treats the capture managers on the media modules as a remote file server.
Object Repository 1504 (see FIG. 15)
FIG. 21 depicts several application server object repository packages 2100. The object repository 1504 is the heart of the application server and is used to store all application server objects. Virtually all application server subsystems use the object repository to store and access their objects. Several types of objects 2102 in the object repository are shown in FIG. 21.
The object repository can also provide active object capabilities meaning that objects may create notification events on creation, deletion or change of state. This functionality may be used as a triggering mechanism allowing virtually any system capability to be invoked by triggers.
Configuration Manager
FIG. 22A shows an example managed object containment view 2200 of a node as seen by the application server. FIG. 22B depicts an example managed object containment view 2220 of a media module as seen by the application server.
The configuration manager is responsible for providing all access to managed objects in the system. This includes managing the state and availability of hardware objects, compatibility objects, application objects, administrative, session and security objects, UI objects and trigger objects. The managed objects accessed by the configuration manager are not the actual transient objects produced by applications, but are rather configuration objects, which control and reflect the state of applications, hardware, etc. Note that the media module object is created upon insertion into the chassis. The media module sub-objects reside on the media module.
FIG. 23 is a flow diagram of a process 2300 in which the configuration manager uses the compatibility objects as a rules base for managing version and capability relationships between the system and its modules (hardware and software). In operation 2302, a media module is received into the chassis. The application server detects the module and creates an (root) object for it in operation 2304. The version and capabilities of the module are detected in operation 2306, and in operation
2308, are compared with an entry of its class in the compatibility tree. If the version is incompatible, the new module is disabled in operation 2310 and an alarm is generated in operation 2312. Otherwise, the default configuration is applied to the module in operation 2314 and in operation 2316, the module is activated. The state of the module and all of its sub-objects are now available for further operations. This same process may apply for any additional hardware or software modules.
Session Manager
The session manager is responsible for controlling users logging into the system, authenticating them, validating access privileges, etc. The session manager uses the security manager, configuration manager and registry services subsystems to perform much of this functionality. In addition, previously created session configurations may be loaded for the client by the session manager.
Security Manager
The security manager provides authorization levels to users based on provisioned privilege and authentication policies.
Registry Services
The registry services subsystem provides a capability to associate items of interest to individual users of the system or to the system itself. The registry can have two major classes of entries:
1. "User" entry
2. "System" entry
Where the system entry is a global entry, which can only be accessed by the system administrators or users with appropriate privileges. The user entries are created when a user configures a session on the system. In both cases, the types of information listed in Table 21 are maintained in the registry:
TABLE 21 The set of triggers associated with the user or system and their state. The set of alarm objects the user or system has registered to receive. The set of event objects the user or system has registered to receive. The set of statistics objects the user or system has registered to receive. The set of reports (and their criteria) for the user or system. The set of graphs (and their criteria) for the user or system. The set of logs (and their criteria) for the user or system.
In general items the SYSTEM registry entry are those that are viewed as "always important" on a global basis. These items may be available for viewing by all users, higher-level managers, etc. or according to individual user policies. The registry therefore creates a type of customizable steering mechanism that prevents events and data, which are not of interest to everyone from flooding all clients.
FIG. 24 show some of the relationships between the registry services 2400 and other subsystems. FIG. 25 depicts registry entry object associations 2500.
Triggers Manager 1714 (See FIG. 17)
FIG. 26 shows a collection of triggers 2602 and trigger groups 2604. The triggers manager 1714 is indirectly responsible for the creation, deletion, activation and deactivation of triggers and directly responsible for the scheduling and invocation of actions based on triggers. This includes listening for events for enabled triggers, evaluating conditions required to fire the trigger, and invoking the action(s) for the trigger. The set of triggerable events and actions needs to be published by each subsystem via the configuration manager (i.e. through the managed objects for the subsystem). Trigger groups may be created per-user or globally via the registry.
Hardware Services 1716 (See FIG. 17)
The hardware services subsystem provides all event and object communication between the application server and other system modules. This includes CPCI backplane drivers, hardware detection and initial configuration, interrupts, data transfers, etc. Table 22 lists two mechanisms for communication over the CPCI backplane.
TABLE 22 IP over PCI Native PCI (memory mapped)
The first mechanism allows the application server flexible access to all media modules in the system using an IP transport. This mode can be used to provide RMON (SNMP) access to agents on media modules and supports other direct object access protocols. Since the majority of traffic between media modules and the application server is based on configuration, events and statistics the performance is adequate. The second mechanism provides a "raw" transfer mode using the PCI (memory mapped) target/initiator approach. In this mode, very high-speed shared memory transfers are possible using the PCI burst DMA mechanism. This mode may be useful for accessing trace files captured to disk on the media modules, etc.
Media Module
The media module is effectively a single-board, real-time monitor/analyzer and is the single point of network monitoring for the monitoring node. In addition the media module acts as a CPCI (master/slave) "peripheral controller" in any configuration and as such it may always reside in a peripheral slot of a CPCI chassis. The hardware for this module includes multiple microprocessors, FPGAs and other application-specific circuitry. The media module supports Gigabit Ethernet (and others). The main processor on the media module can run a real-time embedded OS (V.times.Works). Table 23 lists some of the features of this module.
TABLE 23 Two fully independent pipelined RISC processors providing over 1.6 GHz total performance Common, reusable base design (media independent portion) Application-specific PMD subsystem encapsulates all media-dependent functionality Dedicated FPGA engines for PMD, capture, filtering and other HW assist functions Flexible multi-stage HW filtering including adaptive modes for loss-less flow processing Wire-speed capability for capture and low-level statistics Multi-level RMON functionality - RMON 1, RMON 2, RMON 3 (TPM and APM) Multi-level Expert monitoring - Media, Network, Transport, Session, Service and APM Multi-mode adaptive filtering for Expert functions Per-application time-slice priority scheduling for "Roving Expert" mode On demand enabling of additional expert functions in diagnostic modes On-board RMON agent functionality Flexible triggers support for application customization Persistent logging of alarms, events, statistics and reports Optional secondary (HDD) capture storage Dedicated supplementary Ethernet management interface
The media module is generally responsible for the functions listed in Table 24.
TABLE 24 Acts as self-contained monitor/probe in system Provides capability, configuration and version information to application server Dispatches events (alarms, traps, trigger events) to application server Provides all monitoring functions for one or more network segments Provides RMON functionality as a "virtual probe" Provides maintenance and upgrade functions (SW download, new features/hardware, etc.) Provides statistics, alarms, events, traces, RMON and expert objects to application server Provides application customization via installable triggers
The media module hardware and software architecture is optimized based on three main functions:
1. Flow Classification
2. RMON (1, 2, APM and TPM)
3. Expert Monitoring (APM, TPM and diagnostics)
where 1, 2 and 3 above are interrelated as set forth in Table 25 and as shown in FIG. 27, which depicts the major subsystems of a media module 2700 and their dependencies.
TABLE 25 Flow classification is a core function used by RMON and Expert applications Expert is a core function used by the APM, TPM and other components of RMON Expert provides advanced APM functions (i.e. added value above RMON APM) RMON and Expert interfaces are provided to the application server for access and presentation
As will be seen in the following sections, the media module is architected to optimize performance for each of these functions. This optimization consists of application specific hardware, distributed filtering and partitioning of software on multiple processors to provide the highest levels of run-time performance. The majority of this optimization revolves around the flow classification function, as this is central to all other functions on the media module.
Hardware Description
As mentioned in a previous section, the media module is preferably a CPCI single board hardware/real-time software module. This board is essentially a high-powered monitor/analyzer on a CPCI module. FIG. 28 is a high-level diagram that shows the basic components of media module hardware and dependencies. Each of the hardware components and subsystems will be described in the following sections.
PMD Subsystem 2802
FIG. 29 shows a top-level view of the PMD subsystem 2802. The PMD subsystem provides the items listed in Table 26.
TABLE 26 A low-level protocol termination (e.g. GbE, ATM, POS, etc.) for each interface Configuration for each interface according to the application Alarms, statistics and counts for each interface and protocol termination Filters for including or excluding low-level protocol units for further processing Tables for associating endpoints or connections with their respective errors, counts and statistics Signaling termination for media types that contain control flows (i.e. ATM, etc.) Synchronizes to external timing sources for frequency traceability (timestamp correlation) Packet reassembly for processing by the flow classification engine Pre-pending each packet with a timestamp/status descriptor Multiplexing packets from multiple interfaces into a single packet stream (PLI) Performing flow control and elastic buffering for timing decoupling
Associated with each PMD type is a "media expert" function, which both encapsulates and provides a well-defined interface to the above functions. The media expert may be implemented as a combination of hardware and software. The software portion may be implemented in a dedicated task on the media module main processor, or in a dedicated PMD processor. For simpler protocols (Ethernet, etc.) the task approach can be used, whereas for more complicated protocols (that involve complex signaling), a dedicated PMD processor is preferable. In addition, the PMD is responsible for providing a packet-level interface to the flow classification engine. Since the flow classifier only understands packets, any cell or other transport streams may be reassembled prior to presentation to the capture control interface.
The PMD subsystem prepends each packet passed on to the capture subsystem with a descriptor containing the information listed in the Table 27.
TABLE 27 Timestamp Frame type (control, etc.) Interface ID and direction Error status (i.e. too short, too long, etc.) Original length Truncated length Total length (including prepended descriptor) Etc.
In addition the PMD maintains all interface counts appropriate to the media (packets, bytes, too long, too short, etc.) as well as any alarm status and control.
Physical Interfaces 2902
The physical interfaces may be optical or electrical, depending on the media type. For Gigabit Ethernet, these interfaces can be optical and can be provided by GBIC devices.
External Timing Interface 2904
The timing interface provides a mechanism to use an outside timing source for providing per-packet timestamps. This may be used to synchronize the timing across multiple media modules in different locations. The external timing interface may be provided to all media modules in a shelf system by a set of predefined signals on the CPCI backplane. The source of these timing signals can be an optional GPS (or other) timing module.
uP Interface 2906
The uP interface provides the media module (main) processor access to all configuration and status registers, memories, etc for the PMD. In the cases where a dedicated PMD processor exists, this interface may utilize a shared memory mechanism.
Packet Level Interface 2910
The packet level interface is used for transferring pre-filtered packets to the capture subsystem. This interface provides a unified (multiplexed) stream containing packets received from all physical interfaces that are destined for capture or queuing. This interface either provides timing to or receives timing from the capture subsystem. Buffering within the PMD resolves the timing boundary issues across this interface. The capture subsystem can use a demand-driven transfer mechanism to retrieve packets when available from the PMD.
Capture Subsystem 2804 (See FIG. 28)
The capture subsystem provides filtering and buffering for packets received from the PMD, an interface to the flow processor for accessing packets in the capture buffer and an interface for forwarding a selected subset of the captured packets to the focus buffer. In this respect, the capture subsystem provides a triple-ported interface to the capture buffer. FIG. 30 shows a top-level view of the capture subsystem 2804.
The capture subsystem provides the functions listed in Table 28.
TABLE 28 Packet buffering (1 Gbyte) supporting multiple operating modes Raw-mode capture at wire speed (for Gigabit) Wire-speed packet filtering supporting multiple operating modes Wire-speed priority queuing for selected flows (128K priority flows) Packet transfer (DMA) into capture buffer from PMD subsystem Packet transfer (DMA) from capture buffer to focus buffer Packet transfer (DMA) from capture buffer to flow processor via uP interface Direct access (non-DMA) for flow processor via uP interface Hardware triggers for starting and stopping capture in diagnostic mode
Packet Level Interface 3002
The packet level interface is the source of all packet data to be processed by the capture subsystem. The capture subsystem retrieves packets from the PMD whenever packets are available as indicated by the PMD. This interface uses DMA to transfer packets into the capture buffer after parsing and filtering each received packet.
uP Interface 3004
The uP interface provides the media module (flow) processor access to all configuration and status registers, memories, etc for the capture subsystem. This interface is the source of all packet data to be processed by the flow processor and is controlled exclusively by the flow processor. This includes setting up filters and triggers, managing queues and initiating DMA transfers for forwarding selected packets on to the focus buffer. This interface can support an on-demand hardware packet transfer mechanism (DMA) into the flow processor's local memory to alleviate timing contention for the capture buffer.
Focus Buffer Interface 3006
The focus buffer interface is used for transferring packets from the capture buffer into the focus buffer. This forwarding uses DMA and is under control of the flow processor. Operationally, once the flow processor has analyzed a packet in the capture buffer, a decision is made whether to forward the packet on or not. If the packet is to be forwarded, the flow processor initiates the transfer across this interface. A control mechanism can exist to indicate when the focus buffer is full.
Capture modes
The capture subsystem provides two primary modes of operation, and several sub-modes within each primary mode. The primary modes are listed in the Table 29.
TABLE 29 Diagnostic Mode Monitoring Mode
In diagnostic mode the capture buffer takes snapshots of data from the line and provides basic (pattern) filtering capabilities. The buffer modes supported in diagnostic mode include those listed in Table 30.
TABLE 30 Fill and stop Wrap
In fill and stop mode, when a capture is initiated (usually by a trigger), the buffer fills linearly until full or a stop trigger is fired. In the wrap mode, the buffer is continuously being overwritten with the most recent data from the line until a stop trigger is fired. The start and stop capture triggers are implemented in hardware and support stop after N (bytes) capability. This allows a user defined capture window with information both before and after the event of interest.
In monitoring mode, the capture buffer acts as a high performance FIFO queue. Table 31 below lists buffer modes supported in monitoring mode.
TABLE 31 Priority queuing Non-priority queuing
In priority queuing mode, the buffer is segmented into two virtual queues: priority and non-priority. Each queue maintains and is accessed by separate head, tail and current offset pointers. Associated with the priority queue is a priority filter table (CAM), which contains information pertaining to the priority flows (e.g. address pairs, etc.) The buffer space for each queue is varies dynamically based on the arrival of packets that meet the priority criteria (i.e. have an entry in the priority filter). Initially all packets are considered non-priority, but as the flow processor identifies a flow as being "important", information about the stream of packets that comprise the flow is written back to the queue manager and tagged as priority.
As the number of priority flows increases, buffers are reallocated to the priority queue from the non-priority queue. Likewise when the number of priority flows decreases, buffers are reallocated to the non-priority queue. These queues effectively appear as separate FIFOs with varying depth and are completely managed by hardware.
This mechanism allows the flow processor to focus on servicing priority packets over non-priority packets to prevent data loss. To manage the aggregate packet rate and avoid dropped packets, the flow processor monitors the average depth of the priority queue and may selectively discard flows from the priority filter.
In the non-priority queuing mode, the capture buffer appears as a single FIFO and gives no particular preference to the packets being captured. Packets are therefore likely to be dropped in this mode.
Filtering Modes
The capture subsystem supports various hardware filtering capabilities depending on operating mode (i.e. diagnostic or monitor). In any mode, a dedicated 72 bit wide content addressable memory (CAM) is used to provide the filtering on 128K flows. In diagnostic mode, patterns may be entered into the CAM based on information contained in Table 32.
TABLE 32 Information in the PMD prepended descriptor (i.e. errored, interface ID, etc.) Information contained in the DLC header (i.e. addresses, etc.) Information contained in the L3 header (i.e. addresses, etc.) Information contained in higher-layer headers (under evaluation)
In monitoring mode, the CAM is used as a priority flow recognition mechanism, which allows the flow processor to give priority to a set of flows that contain the provisioned L3 (or other) address pairs corresponding to packets of interest. What normally constitutes the criteria for flows of interest is an unbiased rate throttling mechanism, whereby a population of flows are given priority based on being already classified. This mechanism may be extended however by biasing the priority filter to focus on a set of flows which have some significance to the flow processor or other entity. In this case, only flows that match the focus criteria are given priority, effectively filtering out other "non-interesting" flows.
Flow Processor Subsystem 2806 (See FIG. 28)
The media module flow processor is a microprocessor subsystem dedicated to the task of flow classification. This processor is the main client of the capture buffer and pre-processes all packets for further analysis by the main processor. This processor stores the results of classification in shared memory and builds a descriptor for each packet forwarded on to the main processor (through the focus buffer). Tasks on the main processor may identify a flow as being important by tagging its flow record in the shared memory, which the flow processor subsequently uses as criteria for forwarding additional packets of that flow. This mechanism provides another type of adaptive filtering capability to reduce the probability of dropped packets for post-classification analysis. This processor can have its own dedicated program and data memories as well as access to the shared memory. The processor may or may not require an OS.
Main Processor Subsystem 2808 (See FIG. 28)
The media module main processor can be, for example, an 800 MHz PowerPC dedicated to providing general application support for the media module. In addition, the main processor subsystem provides the functionality set forth in Table 33.
TABLE 33 All expert monitoring/analysis functions using results from the flow processor RMON (1, 2 and APM) agent functionality via results from the flow processor and expert Provides all access to the focus buffer (e.g. for the expert task) Executes all trigger functions, with the exception of hardware triggers Provides alarm, event and object access services to application server Provides persistence and aggregation for transient (expert and flow) objects as required Provides configuration interface to the application server as well as local applications Provides FLASH based storage for critical configuration information 1 Gbyte of main (SDRAM) memory Manages and shares data for all CPCI bus access Provides 10/100
Ethernet interface Encapsulation of all filtering and capture diagnostic services All self-test and maintenance functions
This processor can run the V.times.Works real-time embedded operating system.
Shared Memory Subsystem 2810 (See FIG. 28)
FIG. 31 shows a top-level view of the shared memory subsystem 2810. The shared memory subsystem provides a data and event communication mechanism between the flow processor and the main processor. This memory is made equally available to the two processors via arbitration. All flow records created by the flow processor are stored in this memory in addition to per-packet parse descriptors. The descriptors are queued to allow the main processor to perform asynchronous processing of packets from the flow processor. In addition, the main processor may write-back pointers and flow control (filter) information in the shared flow records as a feedback mechanism for selecting a focus set. This subsystem also serves as the download, configuration and status mechanism for the flow processor and FPGAs.
Focus Subsystem 2812 (See FIG. 28)
The focus subsystem provides buffering for packets received from the capture subsystem and an interface to the main processor for accessing those packets in the focus buffer. In effect, the focus subsystem provides a dual-ported interface to the focus buffer. FIG. 32 shows a top-level view of the focus subsystem 2812.
The focus subsystem provides the functionality listed in Table 34.
TABLE 34 Packet buffering (512M byte) supporting multiple operating modes Post-classification capture mode Classification based priority queuing for selected flows Packet transfer (DMA) from focus buffer to main processor via uP interface Direct access (non-DMA) for main processor via uP interface Hardware triggers for starting and stopping focus capture in diagnostic mode
UP Interface 3202
The uP interface provides the media module (main) processor access to all configuration and status registers, memories, etc for the focus subsystem. This interface is the source of all packet data to be processed by the main processor (expert, etc.) and is controlled exclusively by the main processor. This interface can support an on-demand hardware packet transfer mechanism (DMA) into the main processor's local memory to alleviate timing contention for the focus buffer.
Capture Buffer Interface 3204
The capture buffer interface is used for transferring packets from the capture buffer into the focus buffer. This forwarding uses DMA (in the capture subsystem) and is under control of the flow processor. Operationally, once the flow processor has analyzed a packet in the capture buffer, a decision is made whether to forward the packet on or not. This decision is based on indications fed back from the expert task on main processor for the scope (flows) expert is interested in and is effectively a second level of filtering. If the packet is to be forwarded, the flow processor initiates the transfer across this interface. A control mechanism may be provided to indicate when the focus buffer is full.
Focus Buffer Modes
Like the capture subsystem, the focus subsystem provides two primary modes of operation, and several sub-modes within each primary mode. The primary modes are listed in Table 35 below.
TABLE 35 Diagnostic Mode Monitoring Mode
In diagnostic mode the focus buffer takes snapshots of data from the capture buffer based on classification (i.e. multi-layer) filtering provided by the flow processor. The buffer modes supported in diagnostic mode are listed in Table 36.
TABLE 36 Fill and stop Wrap
In fill and stop mode, when a capture is initiated (usually by a trigger), the buffer fills linearly until full or a stop trigger is fired. In the wrap mode, the buffer is continuously being overwritten with the most recent data from the line until a stop trigger is fired. The start and stop capture triggers are implemented in hardware and support stop after N (bytes) capability. This allows a user defined capture window with information both before and after the event of interest.
In monitoring mode, the focus buffer acts as a high performance FIFO queue. Table 37 lists buffer modes supported in monitoring mode.
TABLE 37 Priority queuing Non-priority queuing
In priority queuing mode, the buffer is segmented into two virtual queues: priority and non-priority. Each queue maintains and is accessed by separate head, tail and current offset pointers. Associated with the priority queue is a priority tagging mechanism provided by the flow processor, which is based on which flows are important to expert. The buffer space for each queue is varies dynamically based on the arrival of classified packets that meet the priority criteria (i.e. have a priority entry in the flow classifier).
Initially all packets are considered non-priority, but as the expert task identifies a flow as being "important", information about the stream of packets that comprise the flow is written back to the flow processor and tagged as priority.
As the number of priority flows increases, buffers are reallocated to the priority queue from the non-priority queue. Likewise when the number of priority flows decreases, buffers are reallocated to the non-priority queue. These queues effectively appear as separate FIFOs with varying depth and are completely managed by hardware.
This mechanism allows the expert task to focus on servicing priority packets over non-priority packets to prevent data loss. To manage the aggregate packet rate and avoid dropped packets, the expert task monitors the average depth of the priority queue and may selectively discard flows from the priority filter.
In the non-priority queuing mode, the focus buffer appears as a single FIFO and gives no particular preference to the packets being captured other than through flow filtering. Packets are therefore more likely to be dropped in this mode.
Filtering Modes
It should be noted that unlike the capture subsystem, the focus subsystem does not provide hardware filtering. Instead, filtering is achieved using a software feedback approach. In this approach, the flow processor is directed by the main processor (expert) as to the focus set of applications, etc. that are forwarded on for expert processing. In addition, the priority queuing of a subset of flows within the focus set is used to provide additional filtering capability.
HDD 2814 (See FIG. 28)
The media module has the ability to use an optional hard drive for the persistent storage of various data. Table 38 lists some of the uses for the HDD module.
TABLE 38 Storing RMON history Storing expert history Storing alarm and event logs Storing aggregated objects Storing capture data for the MI expert (or other app) on the application server Storing capture data for post-capture analysis by a sniffer, etc.
The HDD (when equipped) resides on a CPCI rear transition module directly behind the media module. The media module provides an IDE interface on a set of user defined CPCI backplane signals.
CPCI Interface 2816 (See FIG. 28)
The CPCI backplane interface on the media module can be used for all communications with the application server or other client modules. This interface may be set up in transparent or non-transparent modes and provides both target and initiator capabilities. The main processor memory is made accessible to the application server via this interface for general communication (configuration, download, status, etc.) and any shared object access. This interface also allows the application server access to the focus buffer and local HDD.
Ethernet Interface 2818 (See FIG. 28)
The media module provides a dedicated 10/100 interface via the front bezel, which may be used for debugging, alternate access for management systems, etc.
Software Description
This section will describe the software subsystems and interfaces which comprise the media module. A top-down approach will be used to introduce the overall architecture and each of the constituent subsystems. This architecture should be viewed as an illustrative model, which can be changed as more focused resources are added to the development.
FIG. 33 shows top-level subsystems and dependencies of a media module 3300 according to one embodiment. In FIG. 33, a set of top-level packages, representing major architectural components are shown. In the following subsections, each will be described and further decomposed into additional subsystems with their descriptions. As should be obvious, the architecture is very centered around the common data repository 3302 (and configuration manager 3304). This repository is viewed as being a shared memory database, which is accessible by all subsystems. As will be seen, this is an important part of the architecture for supporting inter-subsystem communications and triggering functions.
With continued reference to FIG. 33, a set of common engines 3306 are provided for supporting generic functions (i.e. logging, statistics, alarm and event managers). These engines each provide a consolidated point for managing and maintaining common types of information from various sources for local subsystems and the application server. A set of subsystems 3308 provide analysis, monitoring and triggering services either directly to clients (i.e. expert to RMON) or to the application server. A hardware services subsystem 3310 provides all access to hardware objects (interfaces, HDD, etc.), including events, configuration, statistics, and maintenance functions. Note that throughout this section it is assumed that inter-subsystem object access is provided through the data repository and events are passed between subsystems using OS or hardware mechanisms.
Persistence Manager 3312 (See FIG. 33)
The persistence manager is responsible for gathering any transient objects that require storage beyond their active state. For example, APM requires that objects related to flows (connection between client, server and application) be aggregated beyond the life of a single flow involving the three parts. This requires a type of medium term persistence so that a client may view the behavior of the flow over time. A longer-term persistence (i.e. indefinite) may also be provided for providing history and logging. This type of persistence requires storage to a non-volatile medium such as a hard disk. The persistence manager has access to three types of storage for persisting objects it is responsible for, listed in Table 39 below.
TABLE 39 Main processor memory (i.e. database) FLASH memory of the main processor The optional RTM hard drive
The primary mechanism for persisting aggregated information can be to store the native flow and expert objects in a hierarchical database. Reports (RMON, etc.) may be generated on an as needed (i.e. per query) basis from these objects eliminating the need to store RMON tables, etc. This aggregation can be performed as a background or periodic task, which collects objects from the flow processor and expert enabling them to focus on current (transient) flows only. There may be a second level to this mechanism whereby the optional media module hard drive is used to provide further long-term storage for these objects.
The FLASH database is used for storing critical configuration data, which may always be available even after power loss or reset events. The type of data to be stored in flash is listed in Table 40.
TABLE 40 General configuration data (modes, parameters, etc.) Current clients and their enabled report types (RMON community strings, etc.) Module, software and hardware version and capabilities information Alarms, critical events and global counts (interface errors, etc.) Other information
The persistence manager may encapsulate all three storage mediums using a common interface (API) to minimize the impact of reassigning data from one storage area to another. The persistence manager therefore is responsible for the collection, storage and deletion (clean-up) of all persistent objects on the media module. The clients of this subsystem are listed in Table 41.
TABLE 41 Media module RMON agent Media module configuration manager Media module logging, statistics, alarm and event managers Media module triggers manager Application server applications (i.e. MI expert)
Media Module Expert 3314 (See FIG. 33)
The system may support different experts that monitor different protocol layers as well sets of protocols/applications that make up a service. The experts can be turned on and off independent of other experts within the system. The experts can be enabled on a Media Module basis, with all interfaces within the Media Module running the same set of experts. Each individual Media Module within the system can have a different set of experts running.
The media module expert subsystem is a real-time application monitoring and analysis engine running on the media module main processor, which builds information based on receiving per-packet data for selected flows. The main focus for this analysis is application performance monitoring (APM) which supports both RMON and local applications. This information is built upon and enhances information gathered by the flow processor and falls generally into three categories:
1. Monitoring information
2. Diagnostic information
3. Troubleshooting information
Where monitoring information generally refers to functions related to providing APM metrics, deep application recognition and application subtype classification (e.g. MIME types over HTTP, etc.). Diagnostic information is gathered in focused monitoring modes and includes APM "drill-down" monitoring (i.e. TPM), as well as detecting any general network related anomalies. Troubleshooting information is gathered in diagnostic mode during fault isolation monitoring where a specific problem exists and a user is searching for an exact cause of the problem. This last type of information may include capture data as well as alarms and diagnoses. The two operating modes for the media module expert are monitoring mode and diagnostic mode. Different expert capabilities exist in each of these modes.
Table 42 below lists some processes that the media module expert subsystem is generally responsible for.
TABLE 42 Selecting a set of flows as candidates for analysis based on flow criteria Providing deep application analysis on selected flows (depending on operating mode) Providing application performance functions and metrics in monitoring mode Providing deep application content (subtype) information in monitoring mode Providing deep application distribution information including subtypes Providing session layer information (login names, etc.) to augment APM when enabled Providing transport performance metrics (TPM) as a diagnostic mode function Providing transport layer and network layer monitoring in diagnostic mode Providing focus set selection criteria to the flow processor depending on mode Prioritizing flows within the selection set to avoid dropped packets Performing "expert capture" functions in troubleshooting mode Maintaining a correlation (binding) between expert objects and flow records
The media module expert uses the results of flow processing (classification) as a foundation for all of its operations. The flow processor stores the results of its parsing and classification in the shared memory between the two processors. The expert subsystem uses packets, events, flow records and parse descriptors produced by the flow processor in its processing and stores its own results (objects) in main processor memory. Several mechanisms exist which allow the expert subsystem to focus on a particular set of flows that are of interest at a given time. What constitutes flows as being of interest depends on the operating mode and protocol scheduling within the expert task.
FIG. 34 shows the main components of the media module expert subsystem 3314. As shown in FIG. 34, the media module expert is comprised of a set of component subsystems 3402-3410, which will be described in the following sections. In the system architecture, individual real-time expert components may be enabled independently of each other and do not necessarily require that all lower layers be enabled to process packets. Instead, all expert components rely on the parsing, filtering and classification results from the flow processor as a basis for their operation. In addition, all expert objects are tied to flows in that they are directly traceable (linked) to the flow record for the specific flow. For each flow that the expert processes, an expert flow record, containing parameter areas for each enabled component is created in main processor memory. Each expert component has access to all areas of the flow record which may provide useful information for its processing.
Expert components are generally classified (and sub-classified) by layer according to their operations and include the main classes shown in Table 43.
TABLE 43 Network expert 3402 Transport expert 3404 Session expert 3406 Application expert 3408 Service expert 3410
Some experts may rely on other experts. For instance, the Services Experts can rely on multiple subclasses within the Application Expert to evaluate the specific service, or the Application Performance Monitoring Expert may rely on a Transport Expert to drill-down on what could be causing performance problems.
FIG. 35 illustrates a top-level Media Module Expert component classification 3500.
It should be noted that this classification is presented for analysis purposes only and does not imply any particular coding methodology. As can be seen, the only mandatory expert component is in the application monitoring class. The session and transport components (login and TPM in particular) are associated with application monitoring and may be provided to enhance APM functionality. Turning on any optional expert components will have an impact on APM performance.
Network Expert 3402 (See FIG. 34)
The network expert components are available in diagnostic mode and provide network layer analysis of potential problems that may affect application performance. Some of the functionality provided by these optional network layer expert components is set forth in Table 44 below. These expert components would not normally be activated in monitoring mode.
TABLE 44 Network layer symptoms Network layer diagnoses Network layer alarms
Transport Expert 3404 (See FIG. 34)
The transport expert components are available in diagnostic mode and provide transport layer analysis of potential problems that may affect application performance. In addition, a special class of transport expert (TPM expert) may provide transport performance metrics and is considered a diagnostic extension of APM that is used in "drill-down" mode. These metrics include statistical means, deviations, etc. and are particular to TPM. Some of the functionality provided by the other optional transport layer expert components are set forth in Table 45. These expert components would not normally be activated in monitoring mode.
TABLE 45 Transport layer symptoms Transport layer diagnoses Transport layer alarms Tunneled transports
Session Expert 3406 (See FIG. 34)
The session expert components are available in diagnostic mode and provide session layer analysis of potential problems that may affect application performance. In addition, a special class of session expert (Login expert) may provide discovery and correlation of computer (host) and user names and logins and is considered a desired extension of APM. Table 46 illustrates s