Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
5434913
Tung , ; et al.
July 18, 1995
Title
Audio subsystem for computer-based conferencing system
Abstract
An audio task residing on an audio/communications board of an audio subsystem in a computer conferencing system. An audio manager and an audio applications programming interface reside on a host processor of the computer conferencing system. The audio task receives local analog audio signals, generates local compressed audio signals corresponding to the local analog audio signals, and passes the local compressed audio signals to a communications subsystem of the computer conferencing system for transmission over a communications link to a remote computer conferencing system. The audio task receives remote compressed audio signals from the communications subsystem and generates remote decompressed audio signals corresponding to the remote compressed audio signal for local playback.
Inventors:
Tung; Peter
(Beaverton,
OR
)
, Vrvilo; Ben
(Portland,
OR
)
Assignee:
Intel Corporation
(Santa Clara,
CA
)
Appl. No.:
158246
Filed:
November 24, 1993
Current U.S. Class:
379/202.01
709/204
709/247
Field of Search:
395/800,162 379/202,205,203,204
U.S. Patent Documents
4475193
October 1984
Brown
4888795
December 1989
Ando et al.
5014267
May 1991
Tompkins et al.
5073926
December 1991
Suzuki et al.
5157491
October 1992
Kassatly
5231492
July 1993
Dangi et al.
5315633
May 1994
Champa
5319793
June 1994
Hancock et al.
5335321
August 1994
Harney et al.
Other References
Computer Conferencing: IBM scientists demo prototype of affordable computer conferencing system, Nov. 2, 1992. EDGE, on & about AT&T, v7, n223, p. 22..~
Primary Examiner:
Dwyer; James L.
Assistant Examiner:
Wolinsky; Scott
Attorney, Agent or Firm:
Mendelsohn; Steve Murray; William H.
Claims
What is claimed is:
1. An audio subsystem for a computer conferencing system having a general-purpose host processor, comprising:
(a) a capture thread for:
(1) receiving local audio signals;
(2) compressing the local audio signals to generate local compressed audio signals; and
(3) passing the local compressed audio signals to a communications subsystem of the computer conferencing system for transmission over a communications link to a remote computer conferencing system; and
(b) a playback thread for:
(1) receiving remote compressed audio signals from the communications subsystem, the remote compressed audio signals having been transmitted by the remote computer conferencing system over the communications link; and
(2) decompressing the remote compressed audio signals to generate remote decompressed audio signals for local playback, wherein the capture thread is separate from the playback thread, wherein:
the capture thread and the playback thread are executed by a digital signal processor of the computer conferencing system;
wherein the host processor controls the execution of the capture thread and the playback thread.
2. The audio subsystem of claim 1, wherein:
the capture thread comprises:
(1) a capture SAC (Stereo Audio Codec) device driver for receiving the local audio signals;
(2) a capture echo/suppression driver for reducing echoes in the local audio signals;
(3) a capture mixer/splitter driver for amplifying the local audio signals and for splitting the local audio signals for recording;
(4) a compression driver for compressing the local audio signals; and
(5) a capture timestamp driver for appending timestamps to the local compressed audio signals; and
the playback thread comprises:
(1) a playback timestamp driver for stripping timestamps from the remote compressed audio signals;
(2) a decompression driver for decompressing the remote compressed audio signals;
(3) a playback mixer/splitter driver for amplifying the remote decompressed audio signals and for splitting the remote decompressed audio signals for recording;
(4) a playback echo/suppression driver for reducing echoes in the remote decompressed audio signals; and
(5) a playback SAC device driver for transmitting the remote decompressed audio signals for local playback.
3. The audio subsystem of claim 1, wherein the digital signal processor is part of a combined audio/communications board of the computer conferencing system and wherein the audio subsystem further comprises:
(c) an audio manager executed by the host processor for controlling the operations of the audio subsystem; and
(d) an audio applications programming interface executed by the host processor for providing an interface between an application and the audio subsystem.
4. A computer conferencing system, comprising:
an audio subsystem adapted for residing partially in a general-purpose host processor of the computer conferencing system and partially in an audio board of the computer conferencing system, wherein the audio subsystem comprises:
(1) a capture thread for:
(i) receiving local audio signals;
(ii) compressing the local audio signals to generate local compressed audio signals; and
(iii) passing the local compressed audio signals to a communications subsystem of the computer conferencing system for transmission over a communications link to a remote computer conferencing system; and
(2) a playback thread for:
(i) receiving remote compressed audio signals from the communications subsystem, the remote compressed audio signals having been transmitted by the remote computer conferencing system over the communications link; and
(ii) decompressing the remote compressed audio signals to generate remote decompressed audio signals for local playback, wherein the capture thread is separate from the playback thread, wherein:
the capture thread and the playback thread are executed by a digital signal processor of the computer conferencing system;
wherein the host processor controls the execution of the capture thread and the playback thread.
5. The system of claim 4, wherein:
the capture thread comprises:
(i) a capture SAC (Stereo Audio Codec) device driver for receiving the local audio signals;
(ii) a capture echo/suppression driver for reducing echoes in the local audio signals;
(iii) a capture mixer/splitter driver for amplifying the local audio signals and for splitting the local audio signals for recording;
(iv) a compression driver for compressing the local audio signals; and
(v) a capture timestamp driver for appending timestamps to the local compressed audio signals; and
the playback thread comprises:
(i) a playback timestamp driver for stripping timestamps from the remote compressed audio signals;
(ii) a decompression driver for decompressing the remote compressed audio signals;
(iii) a playback mixer/splitter driver for amplifying the remote decompressed audio signals and for splitting the remote decompressed audio signals for recording;
(iv) a playback echo/suppression driver for reducing echoes in the remote decompressed audio signals; and
(v) a playback SAC device driver for transmitting the remote decompressed audio signals for local playback.
6. The system of claim 4, wherein the audio subsystem further comprises:
(3) an audio manager executed by the host processor for controlling the operations of the audio subsystem; and
(4) an audio applications programming interface executed by the host processor and for providing an interface between an application and the audio subsystem.
7. An audio subsystem for a computer conferencing system having a general-purpose host processor, comprising:
(a) a capture thread for:
(1) receiving local audio signals;
(2) compressing the local audio signals to generate local compressed audio signals; and
(3) passing the local compressed audio signals to a communications subsystem of the computer conferencing system for transmission over a communications link to a remote computer conferencing system; and
(b) a playback thread for:
(1) receiving remote compressed audio signals from the communications subsystem, the remote compressed audio signals having been transmitted by the remote computer conferencing system over the communications link; and
(2) decompressing the remote compressed audio signals to generate remote decompressed audio signals for local playback, wherein the capture thread is separate from the playback thread, wherein:
the capture thread comprises two or more capture drivers, wherein the two or more capture drivers comprise two or more of:
(1) a capture SAC (Stereo Audio Codec) device driver for receiving the local audio signals;
(2) a capture echo/suppression driver for reducing echoes in the local audio signals;
(3) a capture mixer/splitter driver for amplifying the local audio signals and for splitting the local audio signals for recording;
(4) a compression driver for compressing the local audio signals; and
(5) a capture timestamp driver for appending timestamps to the local compressed audio signals; and
the playback thread comprises two or more playback drivers, wherein the two or more playback drivers comprise two or more of:
(1) a playback timestamp driver for stripping timestamps from the remote compressed audio signals;
(2) a decompression driver for decompressing the remote compressed audio signals;
(3) a playback mixer/splitter driver for amplifying the remote decompressed audio signals and for splitting `the remote decompressed audio signals for recording;
(4) a playback echo/suppression driver for reducing echoes in the remote decompressed audio signals; and
(5) a playback SAC device driver for transmitting the remote decompressed audio signals for local playback.
8. The audio subsystem of claim 7, wherein:
the capture thread comprises:
(1) a capture SAC device driver for receiving the local audio signals;
(2) a capture echo/suppression driver for reducing echoes in the local audio signals;
(3) a capture mixer/splitter driver for amplifying the local audio signals and for splitting the local audio signals for recording;
(4) a compression driver for compressing the local audio signals; and
(5) a capture timestamp driver for appending timestamps to the local compressed audio signals; and
the playback thread comprises:
(1) a playback timestamp driver for stripping timestamps from the remote compressed audio signals;
(2) a decompression driver for decompressing the remote compressed audio signals;
(3) a playback mixer/splitter driver for amplifying the remote decompressed audio signals and for splitting the remote decompressed audio signals for recording;
(4) a playback echo/suppression driver for reducing echoes in the remote decompressed audio signals; and
(5) a playback SAC device driver for transmitting the remote decompressed audio signals for local playback.
9. The audio subsystem of claim 7, wherein the capture thread and the playback thread are executed by a digital signal processor of the computer conferencing system and wherein the host processor controls the execution of the capture thread and the playback thread.
10. The audio subsystem of claim 7, wherein the digital signal processor is part of a combined audio/communications board of the computer conferencing system and wherein the audio subsystem further comprises:
(c) an audio manager executed by the host processor for controlling the operations of the audio subsystem; and
(d) an audio applications programming interface executed by the host processor for providing an interface between an application and the audio subsystem.
11. The audio subsystem of claim 7, wherein:
the capture thread comprises:
(1) a capture SAC device driver for receiving the local audio signals;
(2) a capture echo/suppression driver for reducing echoes in the local audio signals;
(3) a capture mixer/splitter driver for amplifying the local audio signals and for splitting the local audio signals for recording;
(4) a compression driver for compressing the local audio signals; and
(5) a capture timestamp driver for appending timestamps to the local compressed audio signals;
the playback thread comprises:
(1) a playback timestamp driver for stripping timestamps from the remote compressed audio signals;
(2) a decompression driver for decompressing the remote compressed audio signals;
(3) a playback mixer/splitter driver for amplifying the remote decompressed audio signals and for splitting the remote decompressed audio signals for recording;
(4) a playback echo/suppression driver for reducing echoes in the remote decompressed audio signals; and
(5) a playback SAC device driver for transmitting the remote decompressed audio signals for local playback;
the capture thread and the playback thread are executed by a digital signal processor of the computer conferencing system;
the host processor controls the execution of the capture thread and the playback thread;
the digital signal processor is part of a combined audio/communications board of the computer conferencing system; and
the audio subsystem further comprises:
(c) an audio manager executed by the host processor for controlling the operations of the audio subsystem; and
(d) an audio applications programming interface executed by the host processor for providing an interface between an application and the audio subsystem.
12. A computer conferencing system, comprising:
an audio subsystem adapted for residing partially in a general-purpose host processor of the computer conferencing system and partially in an audio board of the computer conferencing system, wherein the audio subsystem comprises:
(1) a capture thread for:
(i) receiving local audio signals;
(ii) compressing the local audio signals to generate local compressed audio signals; and
(iii) passing the local compressed audio signals to a communications subsystem of the computer conferencing system for transmission over a communications link to a remote computer conferencing system; and
(2) a playback thread for:
(i) receiving remote compressed audio signals from the communications subsystem, the remote compressed audio signals having been transmitted by the remote computer conferencing system over the communications link; and
(ii) decompressing the remote compressed audio signals to generate remote decompressed audio signals for local playback, wherein the capture thread is separate from the playback thread, wherein:
the capture thread comprises two or more capture drivers, wherein the two or more capture drivers comprise two or more of:
(1) a capture SAC (Stereo Audio Codec) device driver for receiving the local audio signals;
(2) a capture echo/suppression driver for reducing echoes in the local audio signals;
(3) a capture mixer/splitter driver for amplifying the local audio signals and for splitting the local audio signals for recording;
(4) a compression driver for compressing the local audio signals; and
(5) a capture timestamp driver for appending timestamps to the local compressed audio signals; and
the playback thread comprises two or more playback drivers, wherein the two or more playback drivers comprise two or more of:
(1) a playback timestamp driver for stripping timestamps from the remote compressed audio signals;
(2) a decompression driver for decompressing the remote compressed audio signals;
(3) a playback mixer/splitter driver for amplifying the remote decompressed audio signals and for splitting the remote decompressed audio signals for recording;
(4) a playback echo/suppression driver for reducing echoes in the remote decompressed audio signals; and
(5) a playback SAC device driver for transmitting the remote decompressed audio signals for local playback.
13. The system of claim 12, wherein:
the capture thread comprises:
(i) a capture SAC device driver for receiving the local audio signals;
(ii) a capture echo/suppression driver for reducing echoes in the local audio signals;
(iii) a capture mixer/splitter driver for amplifying the local audio signals and for splitting the local audio signals for recording;
(iv) a compression driver for compressing the local audio signals; and
(v) a capture timestamp driver for appending timestamps to the local compressed audio signals; and
the playback thread comprises:
(i) a playback timestamp driver for stripping timestamps from the remote compressed audio signals;
(ii) a decompression driver for decompressing the remote compressed audio signals;
(iii) a playback mixer/splitter driver for amplifying the remote decompressed audio signals and for splitting the remote decompressed audio signals for recording;
(iv) a playback echo/suppression driver for reducing echoes in the remote decompressed audio signals; and
(v) a playback SAC device driver for transmitting the remote decompressed audio signals for local playback.
14. The system of claim 12, wherein the capture thread and the playback thread are executed by a digital signal processor of the audio board and wherein the host processor controls the execution of the capture thread and the playback thread.
15. The system of claim 12, wherein the audio subsystem further comprises:
(3) an audio manager executed by the host processor for controlling the operations of the audio subsystem; and
(4) an audio applications programming interface executed by the host processor for providing an interface between an application and the audio subsystem.
16. The system of claim 12, wherein:
the capture thread comprises:
(i) a capture SAC device driver for receiving the local audio signals;
(ii) a capture echo/suppression driver for reducing echoes in the local audio signals;
(iii) a capture mixer/splitter driver for amplifying the local audio signals and for splitting the local audio signals for recording;
(iv) a compression driver for compressing the local audio signals; and
(v) a capture timestamp driver for appending timestamps to the local compressed audio signals;
the playback thread comprises:
(i) a playback timestamp driver for stripping timestamps from the remote compressed audio signals;
(ii) a decompression driver for decompressing the remote compressed audio signals;
(iii) a playback mixer/splitter driver for amplifying the remote decompressed audio signals and for splitting the remote decompressed audio signals for recording;
(iv) a playback echo/suppression driver for reducing echoes in the remote decompressed audio signals; and
(v) a playback SAC device driver for transmitting the remote decompressed audio signals for local playback;
the capture thread and the playback thread are executed by a digital signal processor of the audio board;
the host processor controls the execution of the capture thread and the playback thread; and
the audio subsystem further comprises:
(3) an audio manager executed by the host processor and for controlling the operations of the audio subsystem; and
(4) an audio applications programming interface executed by the host processor for providing an interface between an application and the audio subsystem.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to audio/video conferencing, and, in particular, to systems for real-time audio, video, and data conferencing in windowed environments on personal computer systems.
2. Description of the Related An
It is desirable to provide real-time audio, video, and data conferencing between personal computer (PC) systems operating in windowed environments such as those provided by versions of Microsoft.RTM. Windows operating system. There are difficulties, however, with providing real-time conferencing in non-real-time windowed environments.
It is accordingly an object of this invention to overcome the disadvantages and drawbacks of the known art and to provide real-time audio, video, and data conferencing between PC systems operating in non-real-time windowed environments.
It is a particular object of the present invention to provide real-time audio, video, and data conferencing between PC systems operating under a Microsoft.RTM. Windows operating system.
Further objects and advantages of this invention will become apparent from the detailed description of a preferred embodiment which follows.
SUMMARY OF THE INVENTION
The present invention is an audio subsystem for a computer conferencing system. An audio task resides on an audio/communications board of the computer conferencing system. An audio manager and an audio applications programming interface reside on a host processor of the computer conferencing system. The audio task receives local analog audio signals, generates local compressed audio signals corresponding to the local analog audio signals, and passes the local compressed audio signals to a communications subsystem of the computer conferencing system for transmission over a communications link to a remote computer conferencing system. The audio task receives remote compressed audio signals from the communications subsystem and generates remote decompressed audio signals corresponding to the remote compressed audio signal for local playback.
BRIEF DESCRIPTION OF THE DRAWINGS
Other objects, features, and advantages of the present invention will become more fully apparent from the following detailed description of the preferred embodiment, the appended claims, and the accompanying drawings in which:
FIG. 1 is a block diagram representing real-time point-to-point audio, video, and data conferencing between two PC systems, according to a preferred embodiment of the present invention;
FIG. 2 is a block diagram of the hardware configuration of the conferencing system of each PC system of FIG. 1;
FIG. 3 is a block diagram of the hardware configuration of the video board of the conferencing system of FIG. 2;
FIG. 4 is a block diagram of the hardware configuration of the audio/comm board of the conferencing system of FIG. 2;
FIG. 5 is a block diagram of the software configuration of the conferencing system of each PC system of FIG. 1;
FIG. 6 is a block diagram of a preferred embodiment of the hardware configuration of the audio/comm board of FIG. 4;
FIG. 7 is a block diagram of the conferencing interface layer between the conferencing applications of FIG. 5, on one side, and the comm, video, and audio managers of FIG. 5, on the other side;
FIG. 8 is a representation of the conferencing call finite state machine (FSM) for a conferencing session between a local conferencing system (i.e., caller) and a remote conferencing system (i.e., callee);
FIG. 9 is a representation of the conferencing stream FSM for each conferencing system participating in a conferencing session;
FIG. 10 is a representation of the video FSM for the local video stream and the remote video stream of a conferencing system during a conferencing session;
FIG. 11 is a block diagram of the software components of the video manager of the conferencing system of FIG. 5;
FIG. 12 is a representation of a sequence of N walking key flames;
FIG. 13 is a representation of the audio FSM for the local audio stream and the remote audio stream of a conferencing system during a conferencing session;
FIG. 14 is a block diagram of the architecture of the audio subsystem of the conferencing system of FIG. 5;
FIG. 15 is a block diagram of the interface between the audio task of FIG. 5 and the audio hardware of audio/comm board of FIG. 2;
FIG. 16 is a block diagram of the interface between the audio task and the comm task of FIG. 5;
FIG. 17 is a block diagram of the comm subsystem of the conferencing system of FIG. 5;
FIG. 18 is a block diagram of the comm subsystem architecture for two conferencing systems of FIG. 5 participating in a conferencing session;
FIG. 19 is a representation of the comm subsystem application FSM for a conferencing session between a local site and a remote site;
FIG. 20 is a representation of the comm subsystem connection FSM for a conferencing session between a local site and a remote site;
FIG. 21 is a representation of the comm subsystem control channel handshake FSM for a conferencing session between a local site and a remote site;
FIG. 22 is a representation of the comm subsystem channel establishment FSM for a conferencing session between a local site and a remote site;
FIG. 23 is a representation of the comm subsystem processing for a typical conferencing session between a caller and a callee;
FIG. 24 is a representation of the structure of a video packet as sent to or received from the comm subsystem of the conferencing system of FIG. 5;
FIG. 25 is a representation of the compressed video bitstream for the conferencing system of FIG. 5;
FIG. 26 is a representation of a compressed audio packet for the conferencing system of FIG. 5;
FIG. 27 is a representation of the reliable transport comm packet structure;
FIG. 28 is a representation of the unreliable transport comm packet structure;
FIG. 29 are diagrams indicating typical connection setup and teardown sequences;
FIGS. 30 and 31 are diagrams of the architecture of the audio/comm board; and
FIG. 32 is a diagram of the audio/comm board environment.
DESCRIPTION OF THE PREFERRED EMBODIMENT(S)
Point-To-Point Conferencing Network
Referring now to FIG. 1, there is shown a block diagram representing real-time point-to-point audio, video, and data conferencing between two PC systems, according to a preferred embodiment of the present invention. Each PC system has a conferencing system 100, a camera 102, a microphone 104, a monitor 106, and a speaker 108. The conferencing systems communicate via an integrated services digital network (ISDN) 110. Each conferencing system 100 receives, digitizes, and compresses the analog video signals generated by camera 102 and the analog audio signals generated by microphone 104. The compressed digital video and audio signals are transmitted to the other conferencing system via ISDN 110, where they are decompressed and converted for play on monitor 106 and speaker 108, respectively. In addition, each conferencing system 100 may generate and transmit data signals to the other conferencing system 100 for play on monitor 106. In a preferred embodiment, the video and data signals are displayed in different windows on monitor 106. Each conferencing system 100 may also display the locally generated video signals in a separate window.
Camera 102 may be any suitable camera for generating NSTC or PAL analog video signals. Microphone 104 may be any suitable microphone for generating analog audio signals. Monitor 106 may be any suitable monitor for displaying video and graphics images and is preferably a VGA monitor. Speaker 108 may be any suitable device for playing analog audio signals and is preferably a headset.
Conferencing System Hardware Configuration
Referring now to FIG. 2, there is shown a block diagram of the hardware configuration of each conferencing system 100 of FIG. 1, according to a preferred embodiment of the present invention. Each conferencing system 100 comprises host processor
202, video board 204, audio/comm board 206, and industry standard architecture (ISA) bus 208.
Referring now to FIG. 3, there is shown a block diagram of the hardware configuration of video board 204 of FIG. 2, according to a preferred embodiment of the present invention. Video board 204 comprises ISA bus interface 310, video bus 312, pixel processor 302, video random access memory (VRAM) device 304, video capture module 306, and video analog-to-digital (A/D) converter 308.
Referring now to FIG. 4, there is shown a block diagram of the hardware configuration of audio/comm board 206 of FIG. 2, according to a preferred embodiment of the present invention. Audio/comm board 206 comprises ISDN interface 402, memory 404, digital signal processor (DSP) 406, ISA bus interface 408, and audio input/output (I/O) hardware 410.
Conferencing System Software Configuration
Referring now to FIG. 5, there is shown a block diagram of the software configuration each conferencing system 100 of FIG. 1, according to a preferred embodiment of the present invention. Video microcode 530 resides and runs on pixel processor
302 of video board 204 of FIG. 3. Comm task 540 and audio task 538 reside and run on DSP 406 of audio/comm board 206 of FIG. 4. All of the other software modules depicted in FIG. 5 reside and run on host processor 202 of FIG. 2.
Video, Audio, and Data Processing
Referring now to FIGS. 3, 4, and 5, audio/video conferencing application 502 running on host processor 202 provides the top-level local control of audio and video conferencing between a local conferencing system (i.e., local site or endpoint) and a remote conferencing system (i.e., remote site or endpoint). Audio/video conferencing application 502 controls local audio and video processing and establishes links with the remote site for transmitting and receiving audio and video over the ISDN. Similarly, data conferencing application 504, also running on host processor 202, provides the top-level local control of data conferencing between the local and remote sites. Conferencing applications 502 and 504 communicate with the audio, video, and comm subsystems using conferencing application programming interface (API) 506, video API 508, comm API 510, and audio API 512. The functions of conferencing applications 502 and 504 and the APIs they use are described in further detail later in this specification.
During conferencing, audio I/O hardware 410 of audio/comm board 206 digitizes analog audio signals received from microphone 104 and stores the resulting uncompressed digital audio to memory 404 via ISA bus interface 408. Audio task 538, running on DSP 406, controls the compression of the uncompressed audio and stores the resulting compressed audio back to memory 404. Comm task 540, also running on DSP 406, then formats the compressed audio format for ISDN transmission and transmits the compressed ISDN-formatted audio to ISDN interface 402 for transmission to the remote site over ISDN 110.
ISDN interface 402 also receives from ISDN 110 compressed ISDN-formatted audio generated by the remote site and stores the compressed ISDN-formatted audio to memory 404. Comm task 540 then reconstructs the compressed audio format and stores the compressed audio back to memory 404. Audio task 538 controls the decompression of the compressed audio and stores the resulting decompressed audio back to memory 404. ISA bus interface then transmits the decompressed audio to audio I/O hardware 410, which digital-to-analog (D/A) converts the decompressed audio and transmits the resulting analog audio signals to speaker 108 for play.
Thus, audio capture/compression and decompression/playback are preferably performed entirely within audio/comm board 206 without going through the host processor. As a result, audio is preferably continuously played during a conferencing session regardless of what other applications are running on host processor 202.
Concurrent with the audio processing, video A/D converter 308 of video board 204 digitizes analog video signals received from camera 102 and transmits the resulting digitized video to video capture module 306. Video capture module 306 decodes the digitized video into YUV color components and delivers uncompressed digital video bitmaps to VRAM 304 via video bus 312. Video microcode 530, running on pixel processor 302, compresses the uncompressed video bitmaps and stores the resulting compressed video back to VRAM 304. ISA bus interface 310 then transmits via ISA bus 208 the compressed video to host interface 526 running on host processor 202.
Host interface 526 passes the compressed video to video manager 516 via video capture driver 522. Video manager 516 calls audio manager 520 using audio API 512 for synchronization information. Video manager 516 then time-stamps the video for synchronization with the audio. Video manager 516 passes the time-stamped compressed video to communications (comm) manager 518 using comm application programming interface (API) 510. Comm manager 518 passes the compressed video through digital signal processing (DSP) interface 528 to ISA bus interface 408 of audio/comm board 206, which stores the compressed video to memory 404. Comm task 540 then formats the compressed video for ISDN transmission and transmits the ISDN-formatted compressed video to ISDN interface 402 for transmission to the remote site over ISDN 110.
ISDN interface 402 also receives from ISDN 110 ISDN-formatted compressed video generated by the remote site system and stores the ISDN-formatted compressed video to memory 404. Comm task 540 reconstructs the compressed video format and stores the resulting compressed video back to memory 404. ISA bus interface then transmits the compressed video to comm manager 518 via ISA bus 208 and DSP interface 528. Comm manager 518 passes the compressed video to video manager 516 using comm API 510. Video manager 516 decompresses the compressed video and transmits the decompressed video to the graphics device interface (GDI) (not shown) of Microsoft.RTM. Windows for eventual display in a video window on monitor 106.
For data conferencing, concurrent with audio and video conferencing, data conferencing application 504 generates and passes data to comm manager 518 using conferencing API 506 and comm API 5 10. Comm manager 518 passes the data through board DSP interface 532 to ISA bus interface 408, which stores the data to memory 404. Comm task 540 formats the data for ISDN transmission and stores the ISDN-formatted data back to memory 404. ISDN interface 402 then transmits the ISDN-formatted data to the remote site over ISDN 110.
ISDN interface 402 also receives from ISDN 110 ISDN-formatted data generated by the remote site and stores the ISDN-formatted data to memory 404. Comm task 540 reconstructs the data format and stores the resulting data back to memory 404. ISA bus interface 408 then transmits the data to comm manager 518, via ISA bus 208 and DSP interface 528. Comm manager 518 passes the data to data conferencing application 504 using comm API 510 and conferencing API 506. Data conferencing application 504
processes the data and transmits the processed data to Microsoft.RTM. Windows GDI (not shown) for display in a data window on monitor 106.
Preferred Hardware Configuration for Conferencing System
Referring again to FIG. 2, host processor 202 may be any suitable general-purpose processor and is preferably an Intel.RTM. processor such as an Intel.RTM. 486 microprocessor. Host processor 202 preferably has at least 8 megabytes of host memory. Bus 208 may be any suitable digital communications bus and is preferably an Industry Standard Architecture (ISA) PC bus.
Referring again to FIG. 3, video A/D converter 308 of video board 204 may be any standard hardware for digitizing and decoding analog video signals that are preferably NTSC or PAL standard video signals. Video capture module 306 may be any suitable device for capturing digital video color component bitmaps and is preferably an Intel.RTM. ActionMedia.RTM. II Capture Module. Video capture module 306 preferably captures video as subsampled 4:1:1 YUV bitmaps (i.e., YUV9 or YVU9). Memory
304 may be any suitable computer memory device for storing data during video processing such as a random access memory (RAM) device and is preferably a video RAM (VRAM) device with at least 1 megabyte of data storage capacity. Pixel processor 302 may be any suitable processor for compressing video data and is preferably an Intel.RTM. pixel processor such as an Intel.RTM. i750.RTM. Pixel Processor. Video bus 312 may be any suitable digital communications bus and is preferably an Intel.RTM. DVI.RTM. bus. ISA bus interface 310 may be any suitable interface between ISA bus 208 and video bus 312, and preferably comprises three Intel.RTM. ActionMedia.RTM. Gate Arrays and ISA configuration jumpers.
Referring now to FIG. 6, there is shown a block diagram of a preferred embodiment of the hardware configuration of audio/comm board 206 of FIG. 4. This preferred embodiment comprises:
Two 4-wire S-bus RJ-45 ISDN interface connectors, one for output to ISDN 110 and one for input from ISDN 110. Part of ISDN interface 402 of FIG. 4.
Standard bypass relay allowing incoming calls to be redirected to a down-line ISDN phone (not shown) in case conferencing system power is off or conferencing software is not loaded. Part of ISDN interface 402.
Two standard analog isolation and filter circuits for interfacing with ISDN 110. Part of ISDN interface 402.
Two Siemens 8-bit D-channel PEB2085 ISDN interface chips. Part of ISDN interface 402.
Texas Instruments (TI) 32-bit 33 MHz 320c31 Digital Signal Processor. Equivalent to DSP 406.
Custom ISDN/DSP interface application specified integrated circuit (ASIC) to provide interface between 8-bit Siemens chip set and 32-bit TI DSP. Part of ISDN interface 402.
256 Kw Dynamic RAM (DRAM) memory device. Pan of memory 404.
32 Kw Static RAM (SRAM) memory device. Part of memory 404.
Custom DSP/ISA interface ASIC to provide interface between 32-bit TI DSP and ISA bus 208. Part of ISA bus interface 408.
Serial EEPROM to provide software jumpers for DSP/ISA interface. Part of ISA bus interface 408.
Audio Codec 4215 by Analog Devices, Inc. for sampling audio in format such as ADPCM, DPCM, or PCM format. Part of audio I/O hardware 410.
Analog circuitry to drive audio I/O with internal speaker for playback and audio jacks for input of analog audio from microphone 104 and for output of analog audio to speaker 108. Part of audio I/O hardware 410.
Referring now to FIGS. 30 and 31, there are shown diagrams of the architecture of the audio/comm board. The audio/comm board consists basically of a slave ISA interface, a TMS320C31 DSP core, an ISDN BRI S interface, and a high quality audio interface.
The C31 Interface is a 32-bit non-multiplexed data port to the VC ASIC. It is designed to operate with a 27-33 Mhz C31. The C31 address is decoded for the ASIC to live between 400 000H and 44F FFFH. All accesses to local ASIC registers (including the FIFO's) are 0 wait-state. Accesses to the I/O bus (locations 440 000H through 44F FFFH) have 3 wait states inserted. Some of the registers in the ASIC are 8 and 16 bits wide. In these cases, the data is aligned to the bottom (bit 0 and up) of the C31 data word. The remainder of the bits will be read as a "0". All non-existent or reserved register locations will read as a "0".
The B-channel interfaces provide a 32-bit data path to and from the B1 and B2 ISDN data channels. They are FIFO buffered to reduce interrupt overhead and latency requirements. The Line-side and Phone-side interfaces both support transparent data transfer--used for normal phone-call,1 FAX, modem and H.221 formatted data. Both interfaces also support HDLC formatting of the B data per channel to support V.120 "data data" transfer.
The receive and transmit FIFO's are 2 words deep, a word being 32 bits wide (C31 native data width). Full, half and empty indications for all FIFO's are provided in the B-channel status registers. Note that the polarity of these indications vary between receive and transmit. This is to provide the correct interrupt signaling for interrupt synchronized data transfer.
The transparent mode sends data received in the B-channel transmit FIFO's to the SSI interface of the ISACs. The transmitted data is not formatted in any way other than maintaining byte alignment (i.e., bits 0, 8, 16, 24 of the FIFO data are always transmitted in bit 0 of the B-channel data). The written FIFO data is transmitted byte 0 first, byte 3 last--where byte 0 is bits 0 through 7, and bit 0 is sent first.
Transparent mode received data is also byte aligned to the incoming B-channel data stream and assembled as byte 0, byte 1, byte 2, byte 3. Receive data is written into the receive FIFO after all four types have arrived.
The ISAC I/O Interface provides an 8 bit multiplexed data bus used to access the Siemens PEB2085s (ISAC). The 8 bits of I/O address come from bits 0 through 7 of the C31 address. Reads and writes to this interface add 3 wait-states to the C31
access cycle. Buffered writes are not supported in this version of the ASIC.
Each ISAC is mapped directly into its own 64 byte address space (6 valid bits of address). Accesses to the ISAC are 8 bits wide and are located at bit positions 0 to 7 in the C31 32 bit word. Bits 8 through 23 are returned as "0"s on reads.
The PB2085's provide the D-channel access using this interface.
The Accelerator Module Interface is a high bandwidth serial communication path between the C31 and another processor which will be used to add MIPs to the board. Certain future requirements such as g.728 audio compression will require the extra processing power.
The data transfers are 32 bit words sent serially at about 1.5 Mbits/s. The VC ASIC buffers these transfers with FICOs which are 2 words deep to reduce interrupt overhead and response time requirements. The status register provide flags for FIFO full, half, empty and over/under-run (you should never get an under-run). Any of these can be used as interrupt sources as selected in the Serial Port Mask register.
The following paragraphs describe the ISA interface of the audio/comm board. The ISA interface is the gate array that provides an interface between the multi-function board and the ISA bus. Further, the ASIC will control background tasks between a DSP, SAC, and Analog Phone line interfaces. The technology chosen for the ASIC is the 1 micron CMOS-6 family from NEC.
Referring now to FIG. 32, there is shown a diagram of the audio/comm board environment. The following is a description of the signal groups.
______________________________________ ISA Bus Signals AEN The address enable signal is used to de-gated the CPU and other devices from the bus during DMA cycles. When this signal is active (high) the DMA controller has control of the bus. The ASIC does not respond to bus cycles when AEN is active. IOCS16# The I/O 16-bit chip select is used by 16-bit I/O devices to indicate that it can accommodate a 16-bit transfer. This signal is decoded off of address only. IOW# This is an active low signal indicating the an I/O write cycle is being performed. IOR# This is an active low signal indicating the an I/O read cycle is being performed. IRQ3, IRQ4, These signals are interrupt requests. An IRQ5, IRQ9, interrupt request is generated when an IRQ IRQ10, IRA11, is raised from a low to a high. The IRQ must IRQ15 remain high until the interrupt service routine acknowledges the interrupt. RESET This signal is used to initialize system logic upon power on. SBHE# The system bus high enable signal indicates that data should be driven onto the upper byte of the 16-bit data bus. SA(9:0) These are the system address lines used to decode I/O address space used by the board. This scheme is compatible with the ISA bus. These addresses are valid during the entire command cycle. SD(15:0) These are the system data bus lines. DSP Signals H1CLK H1CLK is the DSP primary bus clock. All events in the primary bus are referenced to this clock. The frequency of this clock is half the frequency of the clock driving the DSP. See the TMS320C31 data manual chapter 13. D(31:0) These are the DSP 32-bit data bus. Data lines 16, 17, and 18 also interface to the EEPROM. Note that the DSP must be in reset and the data bus tri-stated before access to the EEPROM. This date bus also supplies the board ID when the read while the DSP is reset (see HAUTOID register). C31.sub.-- RST# This is the DSP active low reset signal. A23-A0 These DSP address lines are used to decode the address space by the ASIC. R/W# This signal indicates whether the current DSP external access is a read (high) or a write (low) STRB# This is an active low signal form the DSP indicating that the current cycle is to the primary bus. RDY# This signal indicates that the current cycle being performed on the primary bus of the DSP can be completed. HOLD# The Hold signal is an active low signal used to request the DSP relinquish control of the primary bus. Once the hold has been acknowledge all address, data and status lines are tri-stated until Hold is released. This signal will be used to implement the DMA and DRAM Refresh. HOLDA# This is the Hold Acknowledge signal which is the active low indication that the DSP has relinquished control of the bus. INT2# This C31 interrupt is used by the ASIC for DMA and Command interrupts. INTE1# Interrupt the C31 on COM Port events. INT0# Analog Phone Interrupts. Memory Signals MEMWR1# These signals are active low write strobes for and MEMWR2# memory banks 1 and 2. B1OE#, These signals are active low output enables B20E# for memory banks 1 and 2. SR.sub.-- CS# This is a active low chip selected for the SRAM that makes up bank2. CAS# This the active low column address strobe to the DRAM. RAS# This the active low row address strobe to the DRAM. H1D12, These signals are a 12 and 24 nS delay of H1D24 the H1CLK. MUX Mux is the signal that controls the external DRAM address mux. When this signal is low the CAS addresses are selected and when it is high the RAS addresses are selected. EEPROM Signals EESK This is the EEPROM clock signal. This signal is multiplexed with the DSP data signal lD16. This signal can only be valid while the DSP is in reset. EEDI This is the input data signal to the EEPROM. This signal is multiplexed with the DSP data signal D17. This signal can only be valid while the DSP is in reset. EEDO This is the data output of the EEPROM. This signal is multiplexed with the DSP data signal D18. This signal can only be valid while the DSP is in reset. EECS This is the chip select signal for the EEPROM. This signal is NOT multiplexed and can only be drive active (HIGH) during DSP reset. Stereo Audio Codec (SAC) SP.sub.-- DC This signal controls the SAC mode of operation. When this signal is high the SAC is in data or master mode. When this signal is lw the SAC is in control or slave mode. SP.sub.-- SCLK This is the Soundport clock input signal. This clock will either originate from the Soundport or the ASIC. SP.sub.-- SDIN This serial data input from the Soundport. The data here is shifted in on the falling edge of the SP.sub.-- CLK. SP.sub.-- SDOUT This is the serial data output signal for the Soundport. The data is shifted out on the rising edge of the SP.sub.-- CLK. SP.sub.-- FSYNC This is the frame synchronization signal for the Soundport. This signal will originate from the ASIC when the Soundport is in slave mode or the Soundport is being programmed in control mode. When the Soundport is in master mode the frame sync will originate from the Soundport and will have a frequency equal to the sample rate. CODEC Signals 24.576MHZ This clock signal is used to derive clocks used within the ASIC and the
2.048MHz CODEC clock. COD.sub.-- FS1, These signals are the CODEC frame syncs, COD.sub.-- FS2, each signal correspond to one of the DOC.sub.-- FS3, four CODECs. COD.sub.-- FS4 COD.sub.-- SDOUT This signal is the serial data output signal of the CODES. COD.sub.-- SDIN This signal is the serial data input signal to the CODECs. COD.sub.-- SCLK This a 2.048MHz clock used to clock data in and out of the four CODECs. The serial data is clocked out on the rising edge and in on the falling edge. Analog Phone Signals LPSENSL1 Line1 off hook loop current sense. If this signal is low and BYPSRLY1 is high it indicates the Set 1 has gone off hook. If the signal is low and the BYPSRLY1 is low it indicates that the board has gone off hook. This signal is not latched and therefore is a Real-time-signal. LPSENSPH1 Set 1 off hook loop current sense. If this signal is low it indicates the Set 1 has gone off hook. This can only take place when BYPSRLY1 is low. This signal is not latched and therefore is a Real-time-signal. LPSENSL2 Line2 off hook loop current sense. If this signal is low and BYPSRLY2 is high it indicates the Set 1 has gone off hook. If the signal is low and the BYPSRLY2 is low it indicates that the board has gone off hook. This signal is not latched and therefore is a Real-time-signal. LPSENSPH2 Set 2 off hook loop current sense. If this signal is low it indicates the Set 1 has gone off hook. This can only take place when BYPSRLY2 is low. This signals is not latched and therefore is a Real-time-signal. RINGDETL1 Line 1 Ring Detect. If this input signal is low the Line is ringing. RINGDETL2 Line 2 Ring Detect. If this input signal is low the Line is ringing. CALLDETL2 Call Detect for Line 1. This signal is cleared low by software to detect 1200 baud FSK data between the first and second rings. CALLDETL2 Call Detect for Line 2. This signal is cleared low by software to detect 1200 baud FSK data between the first and second rings. PDOHL1
Pulse Dial Off hook for Line 1. This signal is pulsed to dial phone numbers on pulse dial systems. It is also used to take the line off hook when low. PDOHL2 Pulse Dial Off hook for Line 2. This signal is pulsed to dial phone numbers on pulse dial systems. It is also used to take the line off hook when low. BYPSRLY1 and 2 This is an active low output signal controlling the Bypass Relay output. When high the board is by-passed and the Line (1 or 2) is connected the desk Set (1 or 2). LOOPDIS SWCLR# Miscellaneous Signals 6.144MHZ This a 6.144 MHz clock signal used to drive the module that can attached to the board. The module will then use this signal to synthesize any frequency it requires. TEST1, TEST2, These are four test pins used by the TEST3, TEST4 ASIC designers two decrease ASIC manufacturing test vectors. The TEST2 pin is the output of the nand- tree used by ATE. VDD, VSS ______________________________________
Those skilled in the art will understand that the present invention may comprise configurations of audio/comm board 206 other than the preferred configuration of FIG. 6.
Software Architecture for Conferencing System
The software architecture of conferencing system 100 shown in FIGS. 2 and 5 has three layers of abstraction. A computer supported collaboration (CSC) infrastructure layer comprises the hardware (i.e., video board 204 and audio/comm board 206) and host/board driver software (i.e., host interface 526 and DSP interface 528) to support video, audio, and comm, as well as the encode method for video (running on video board 204) and encode/decode methods for audio (running on audio/comm board 206). The capabilities of the CSC infrastructure are provided to the upper layer as a device driver interface (DDI).
A CSC system software layer provides services for instantiating and controlling the video and audio streams, synchronizing the two streams, and establishing and gracefully ending a call and associated communication channels. This functionality is provided in an application programming interface (API). This API comprises the extended audio and video interfaces and the communications APIs (i.e., conferencing API 506, video API 508, video manager 516, video capture driver 522, comm API 510, comm manager 518, Wave API 514, Wave driver 524, audio API 512, and audio manager 520).
A CSC applications layer brings CSC to the desktop. The CSC applications may include video annotation to video mail, video answering machine, audio/video/data conferencing (i.e., audio/video conferencing application 502 and data conferencing application 504), and group decision support systems.
Audio/video conferencing application 502 and data conferencing application 504 rely on conferencing API 506, which in turn relies upon video API 508, comm API 510, and audio API 512 to interface with video manager 516, comm manager 518, and audio manager 520, respectively. Comm API 510 and comm manager 518 provide a transport-independent interface (TII) that provides communications services to conferencing applications 502 and 504. The communications software of conferencing system 100 supports different transport mechanisms, such as ISDN (e.g., V.120 interface), SW56 (e.g., BATP's Telephone API), and LAN (e.g., SPX/IPX, TCP/IP, or NetBIOS). The TII isolates the conferencing applications from the underlying transport layer (i.e., transport-medium-specific DSP interface 528). The TII hides the network/connectivity specific operations. In conferencing system 100, the TII hides the ISDN layer. The DSP interface 528 is hidden in the datalink module (DLM). The TII provides services to the conferencing applications for opening communication channels (within the same session) and dynamically managing the bandwidth. The bandwidth is managed through the transmission priority scheme.
In a preferred embodiment in which conferencing system 100 performs software video decoding, AVI capture driver 522 is implemented on top of host interface 526 (the video driver). In an alternative preferred embodiment in which conferencing system 100 performs hardware video decoding, an AVI display driver is also implemented on top of host interface 526.
The software architecture of conferencing system 100 comprises three major subsystems: video, audio, and communication. The audio and video subsystems are decoupled and treated as "data types" (similar to text or graphics) with conventional operations like open, save, edit, and display. The video and audio services are available to the applications through video-management and audio-management extended interfaces, respectively.
Audio/Video Conferencing Application
Audio/video conferencing application 502 implements the conferencing user interface. Conferencing application 502 is implemented as a Microsoft.RTM. Windows 3.1 application. One child window will display the local video image and a second child window will display the remote video image. Audio/video conferencing application 502 provides the following services to conferencing system 100:
Manage main message loop.
Perform initialization and registers classes.
Handle menus.
Process toolbar messages.
Handles preferences.
Handles speed dial setup and selections.
Connect and hang up.
Handles handset window
Handle remote video.
Handle remote video window.
Handle local video.
Handle local video window.
Data Conferencing Application
Data conferencing application 504 implements the data conferencing user interface. Data conferencing application is implemented as a Microsoft.RTM. Windows 3.1 application. The data conferencing application uses a "shared notebook" metaphor. The shared notebook lets the user copy a file from the computer into the notebook and review it with a remote user during a call. When the user is sharing the notebook (this time is called a "meeting"), the users see the same information on their computers, users can review it together, and make notes directly into the notebook. A copy of the original file is placed in the notebook, so the original remains unchanged. The notes users make during the meeting are saved with the copy in a meeting file. The shared notebook looks like a notebook or stack of paper. Conference participants have access to the same pages. Either participant can create a new page and fill it with information or make notes on an existing page.
Conferencing API
Conferencing API 506 of FIG. 5 facilitates the easy implementation of conferencing applications 502 and 504. Conferencing API 506 of FIG. 5 provides a generic conferencing interface between conferencing applications 502 and 504 and the video, comm, and audio subsystems. Conferencing API 506 provides a high-level abstraction of the services that individual subsystems (i.e., video, audio, and comm) support. The major services include:
Making, accepting, and hanging-up calls.
Establishing and terminating multiple communication channels for individual subsystems.
Instantiating and controlling local video and audio.
Sending video and audio to a remote site through the network.
Receiving, displaying, and controlling the remote video and audio streams.
Conferencing applications 502 and 504 can access these services through the high-level conferencing API 506 without worrying about the complexities of low-level interfaces supported in the individual subsystems.
In addition, conferencing API 506 facilitates the integration of individual software components. It minimizes the interactions between conferencing applications 502 and 504 and the video, audio, and comm subsystems. This allows the individual software components to be developed and tested independent of each other. Conferencing API 506 serves as an integration point that glues different software components together. Conferencing API 506 facilitates the portability of audio/video conferencing application 502.
Conferencing API 506 is implemented as a Microsoft Windows Dynamic Link Library (DLL). Conferencing API 506 translates the function calls from conferencing application 502 to the more complicated calls to the individual subsystems (i.e., video, audio, and comm). The subsystem call layers (i.e., video API 508, comm API 510, and audio API 512) are also implemented in DLLs. As a result, the programming of conferencing API 506 is simplified in that conferencing API 506 does not need to implement more complicated schemes, such as dynamic data exchange (DDE), to interface with other application threads that implement the services for individual subsystems. For example, the video subsystem will use window threads to transmit/receive streams of video to/from the network.
Conferencing API 506 is the central control point for supporting communication channel management (i.e., establishing, terminating channels) for video and audio subsystems. Audio/video conferencing application 502 is responsible for supporting communication channel management for the data conferencing streams.
Referring now to FIG. 7, there is shown a block diagram of the conferencing interface layer 700 between conferencing applications 502 and 504 of FIG. 5, on one side, and comm manager 518, video manager 516, and audio manager 520, on the other side, according to a preferred embodiment of the present invention. Conferencing API 506 of FIG. 5 comprises conferencing primitive validator 704, conferencing primitive dispatcher 708, conferencing callback 706, and conferencing finite state machine (FSM) 702 of conferencing interface layer 700 of FIG. 7. Comm API 510 of FIG. 5 comprises comm primitive 712 and comm callback 710 of FIG. 7. Video API 508 of FIG. 5 comprises video primitive 716 of FIG. 7. Audio API 5 12 of FIG. 5 comprises audio primitive 720 of FIG. 7.
Conferencing primitive validator 704 validates the syntax (e.g., checks the conferencing call state, channel state, and the stream state with the conferencing finite state machine (FSM) 702 table and verifies the correctness of individual parameters) of each API call. If an error is detected, primitive validator 704 terminates the call and returns the error to the application immediately. Otherwise, primitive validator 704 calls conferencing primitive dispatcher 708, which determines which subsystem primitives to invoke next.
Conferencing primitive dispatcher 708 dispatches and executes the next conferencing API primitive to start or continue to carry out the service requested by the application. Primitive dispatcher 708 may be invoked either directly from primitive validator 704 (i.e., to start the first of a set of conferencing API primitives) or from conferencing callback 706 to continue the unfinished processing (for asynchronous API calls). Primitive dispatcher 708 chooses the conferencing API primitives based on the information of the current state, the type of message/event, and the next primitive being scheduled by the previous conferencing API primitive.
After collecting and analyzing the completion status from each subsystem, primitive dispatcher 708 either (1) returns the concluded message back to the conferencing application by returning a message or invoking the application-provided callback routine or (2) continues to invoke another primitive to continue the unfinished processing.
There are a set of primitives (i.e., comm primitives. 712, video primitives 716, and audio primitives 720) implemented for each API call. Some primitives are designed to be invoked from a callback routine to carry out the asynchronous services.
The subsystem callback routine (i.e., comm callback 710) returns the completion status of an asynchronous call to the comm subsystem to conferencing callback 706, which will conduct analysis to determine the proper action to take next. The comm callback 710 is implemented as a separate thread of execution (vthread.exe) that receives the callback Microsoft.RTM. Windows messages from the comm manager and then calls VCI DLL to handle these messages.
Conferencing callback 706 returns the completion status of an asynchronous call to the application. Conferencing callback 706 checks the current message/event type, analyzes the type against the current conferencing API state and the next primitive being scheduled to determine the actions to take (e.g., invoke another primitive or return the message to the application). If the processing is not complete yet, conferencing callback 706 selects another primitive to continue the rest of the processing. Otherwise, conferencing callback 706 returns the completion status to the application. The conferencing callback 706 is used only for comm related conferencing API functions; all other conferencing API functions are synchronous.
The major services supported by conferencing API 506 are categorized as follows:
Call and Channel Services (establish/terminate a conference call and channels over the call).
Stream Services (capture, play, record, link, and control the multimedia audio and video streams).
Data Services (access and manipulate data from the multimedia streams).
Interfacing with the Comm Subsystem
Conferencing API 506 supports the following comm services with the comm subsystem:
Call establishment--place a call to start a conference.
Channel establishment--establish four comm channels for incoming video, incoming audio, outgoing video, and outgoing audio. These 4 channels are opened implicitly as part of call establishment, and not through separate APIs. The channel APIs are for other channels (e.g., data conferencing).
Call termination--hang up a call and close all active channels.
Call Establishment
Establishment of a call between the user of conferencing system A of FIG. 1 and the user of conferencing system B of FIG. 1 is implemented as follows:
Conferencing APIs A and B call BeginSession to initialize their comm subsystems.
Conferencing API A calls MakeConnection to dial conferencing API B's number.
Conferencing API B receives a CONN.sub.-- REQUESTED callback.
Conferencing API B sends the call notification to the graphic user interface (GUI); and if user B accepts the call via the GUI, conferencing API B proceeds with the following steps.
Conferencing API B calls AcceptConnection to accept the incoming call from conferencing API A.
Conferencing APIs A and B receives CONN.sub.-- ACCEPTED message.
Conferencing APIs A and B call RegisterChanMgr for channel management.
Conferencing API A calls OpenChannel to open the audio channel.
Conferencing API B receives the Chan.sub.-- Requested callback and accepts it via AcceptChannel.
Conferencing API A receives the Chan.sub.-- Accepted callback.
The last three steps are repeated for the video channel and the control channel.
Conferencing API A then sends the business card information on the control channel, which conferencing API B receives.
Conferencing API B then turns around and repeats the above 6 steps (i.e., opens its outbound channels for audio/video/control and sends its business card information on its control channel).
Conferencing APIs A and B then notify the conferencing applications with a CFM.sub.-- ACCEPT.sub.-- NTFY callback.
Channel Establishment
Video and audio channel establishment is implicity done as part of call establishment, as described above, and need not be repeated here. For establishing other channels such as data conferencing, the conferencing API passes through the request to the comm manager, and sends the comm manager's callback to the user's channel manager.
Call Termination
Termination of a call between users A and B is implemented as follows (assuming user A hangs up):
Conferencing API A unlinks local/remote video/audio streams from the network.
Conferencing API A then calls the comm manager's CloseConnection.
The comm manager implicitly closes all channels, and sends Chan.sub.-- Closed callbacks to conferencing API A.
Conferencing API A closes its remote audio/video streams on receipt of the Chan.sub.-- Closed callback for its inbound audio/video channels, respectively.
Conferencing API A then receives the CONN.sub.-- CLOSE.sub.-- RESP from the comm manager after the call is cleaned up completely. Conferencing API A notifies its application via a CFM.sub.-- HANGUP.sub.-- NTFY.
In the meantime, the comm manager on B would have received the hang-up notification, and would have closed its end of all the channels, and notified conferencing API B via Chan.sub.-- Closed.
Conferencing API B closes its remote audio/video streams on receipt of the Chan.sub.-- Closed callback for its inbound audio/video channels, respectively.
Conferencing API B unlinks its local audio/video streams from the network on receipt of the Chan.sub.-- Closed callback for its outbound audio/video channels, respectively.
Conferencing API B then receives a CONN.sub.-- CLOSED notification from its comm manager. Conferencing API B notifies its application via CFM.sub.-- HANGUP.sub.-- NTFY.
Interfacing with the Audio and Video Subsystems
Conferencing API 506 supports the following services with the audio and video subsystems:
Capture/monitor/transmit local video streams.
Capture/transmit local audio streams.
Receive/play remote streams.
Control local/remote streams.
Snap an image from local video stream.
Since the video and audio streams are closely synchronized, the audio and video subsystem services are described together.
Capture/Monitor/Transmit Local Streams
The local video and audio streams are captured and monitored as follows:
Call AOpen to open the local audio stream.
Call VOpen to open the local video stream.
Call ACapture to capture the local audio stream from the local hardware.
Call VCapture to capture the local video stream from the local hardware.
Call VMonitor to monitor the local video stream.
The local video and audio streams are begun to be sent out to the remote site as follows:
Call ALinkOut to connect the local audio stream to an output network channel.
Call VLinkOut to connect the local video stream to an output network channel.
The monitoring of the local video stream locally is stopped as follows:
Call VMonitor(off) to stop monitoring the local video stream.
Receive/Play Remote Streams
Remote streams are received from the network and played as follows:
Call AOpen to open the local audio stream.
Call VOpen to open the local video stream.
Call ALinkIn to connect the local audio stream to an input network channel.
Call VLinkIn to connect the local video stream to an input network channel.
Call APlay to play the received remote audio stream.
Call VPlay to play the received remote video stream.
Control Local/Remote Streams
The local video and audio streams are paused as follows:
Call VLinkout(off) to stop sending local video on the network.
Call AMute to stop sending local audio on the network.
The remote video and audio streams are paused as follows:
If CF.sub.-- PlayStream(off) is called, conferencing API calls APlay(off) and VPlay(off).
The local/remote video/audio streams are controlled as follows:
Call ACntl to control the gains of a local audio stream or the volume of the remote audio stream.
Call VCntl to control such parameters as the brightness, tint, contrast, color of a local or remote video stream.
Snap an Image from Local Video Streams
A snapshot of the local video stream is taken and returned as an image to the application as follows:
Call VGrabframe to grab the most current image from the local video stream.
Conferencing API 506 supports the following function calls by conferencing applications 502 and 504 to the video, comm, and audio subsystems:
______________________________________ CF.sub.-- Init Reads in the conferencing configuration parameters (e.g., pathname of the directory database and directory name in which the conferencing software is kept) from an initialization file; loads and initializes the software of the comm, video, and audio subsystems by allocating and building internal data structures; allows the application to choose between the message and the callback routines to return the event notifications from the remote site. CF.sub.-- MakeCall Makes a call to the remote site to establish a connection for conferencing. The call is performed asynchronously. CF.sub.-- AcceptCall Accepts a call initiated from the remote site based on the information received in the CFM.sub.-- CALL.sub.-- NTFY message. CF.sub.-- RejectCall Rejects incoming call, if appropriate, upon receiving a CFM.sub.-- CALL.sub.-- NTFY message. CF.sub.-- HangupCall Hangs up a call that was previously established; releases all resources, including all types of streams and data structures, allocated during the call. CF.sub.-- GetCallState Returns the current state of the specified call. CF.sub.-- CapMon Starts the capture of analog video signals from the local camera and displays the video in the local.sub.-- video.sub.-- window which is pre-opened by the application. This function allows the user to preview his/her appearance before sending the signals out to the remote site. CF.sub.-- PlayRcvd Starts the reception and display of remote video signals in the remote.sub.-- video.sub.-- window, which is pre-opened by the application; starts the reception and play of remote audio signals through the local speaker. CF.sub.-- Destroy Destroys the specified stream group that was created by CF.sub.-- CapMon or CF.sub.-- PlayRcvd. As part of the destroy process, all operations (e.g., sending/ playing) being performed on the stream group will be stopped and all allocated system resources will be freed. CF.sub.-- Mute Uses AMute to turn on/off the mute function being performed on the audio stream of a specified stream group. This function will temporarily stop or restart the related operations, including playing and sending, being performed on this stream group. This function may be used to hold temporarily one audio stream and provide more bandwidth for other streams to use. CF.sub.-- SnapStream Takes a snapshot of the video stream of the specified stream group and returns a still image (reference) frame to the application buffers indicated by the hBuffer handle. CF.sub.-- Control Controls the capture or playback functions of the local or remote video and audio stream groups. CF.sub.-- SendStream Uses ALinkOut to pause/unpause audio. CF.sub.-- GetStreamInfo Returns the current state and the audio video control block (AVCB) data structure, preallocated by the application, of the specified stream groups. CF.sub.-- PlayStream Stops/starts the playback of the remote audio/video streams by calling APlay/VPlay. ______________________________________
These functions are defined in further detail later in this specification in a section entitled "Data Structures, Functions, and Messages."
In addition, conferencing API 506 supports the following messages returned to conferencing applications 502 and 504 from the video, comm, and audio subsystems in response to some of the above-listed functions:
______________________________________ CFM.sub.-- CALL.sub.-- NTFY Indicates that a call request initiated from the remote site has been received. CFM.sub.-- PROGRESS.sub.-- NTFY Indicates that a call state/progress notification has been received from the local phone system support. CFM.sub.-- ACCEPT.sub.-- NTFY Indicates that the remote site has accepted the call request issued locally. Also sent to the accepting application when CF.sub.-- AcceptCall completes. CFM.sub.-- REJECT.sub.-- NTFY Indicates that the remote site has rejected or the local site has failed to make the call. CFM.sub.-- HANGUP.sub.-- NTFY Indicates that the remote site has hung up the call. ______________________________________
Referring now to FIG. 8, there is shown a representation of the conferencing call finite state machine (FSM) for a conferencing session between a local conferencing system (i.e., caller) and a remote conferencing system (i.e., callee), according to a preferred embodiment of the present invention. The possible conferencing call states are as follows:
______________________________________ CCST.sub.-- NULL Null State -- state of uninitialized caller/callee. CCST.sub.-- IDLE Idle State -- state of caller/callee ready to make/receive calls. CCST.sub.-- CALLING Calling state -- state of caller trying to call callee. CCST.sub.-- CALLED Called state -- state of callee being called by caller. CCST.sub.-- CONNECTED Call state -- state of caller and callee during conferencing session. CCST.sub.-- CLOSING A hangup or call cleanup is in progress. ______________________________________
At the CCST.sub.-- CONNECTED state, the local application may begin capturing, monitoring, and/or sending the local audio/video signals to the remote application. At the same time, the local application may be receiving and playing the remote audio/video signals.
Referring now to FIG. 9, there is shown a representation of the conferencing stream FSM for each conferencing system participating in a conferencing session, according to a preferred embodiment of the present invention. The possible conferencing stream states are as follows:
______________________________________ CSST.sub.-- INIT Initialization state -- state of local and remote streams after CCST.sub.-- CONNECTED state is first reached. CSST.sub.-- ACTIVE Capture state -- state of local stream being captured. Receive state -- state of remote stream being received. CSST.sub.-- FAILURE Fail state -- state of local/remote stream after resource failure. ______________________________________
Conferencing stream FSM represents the states of both the local and remote streams of each conferencing system. Note that the local stream for one conferencing system is the remote stream for the other conferencing system.
In a typical conferencing session between a caller and a callee, both the caller and callee begin in the CCST.sub.-- NULL call state of FIG. 8. The conferencing session is initiated by both the caller and callee calling the function CF.sub.-- Init to initialize their own conferencing systems. Initialization involves initializing internal data structures, initializing communication and configuration information, opening a local directory data base, verifying the local user's identity, and retrieving the user's profile information from the database. The CF.sub.-- Init function takes both the caller and callee from the CCST.sub.-- NULL call state to the CCST.sub.-- IDLE call state. The CF.sub.-- Init function also places both the local and remote streams of both the caller and callee in the CSST.sub.-- INIT stream state of FIG. 9.
Both the caller and callee call the CF.sub.-- CapMon function to start capturing local video and audio signals and playing them locally, taking both the caller and callee local stream from the CSST.sub.-- INIT stream state to the CSST.sub.-- ACTIVE stream state. Both the caller and callee may then call the CF.sub.-- Control function to control the local video and audio signals, leaving all states unchanged.
The caller then calls the CF.sub.-- MakeCall function to initiate a call to the callee, taking the caller from the CCST.sub.-- IDLE call state to the CCST.sub.-- CALLING call state. The callee receives and processes a CFM.sub.-- CALL.sub.-- NTFY message indicating that a call has been placed from the caller, taking the callee from the CCST.sub.-- IDLE call state to the CCST.sub.-- CALLED call state. The callee calls the CF.sub.-- AcceptCall function to accept the call from the caller, taking the callee from the CCST.sub.-- CALLED call state to the CCST.sub.-- CONNECTED call state. The caller receives and processes a CFM.sub.-- ACCEPT.sub.-- NTFY message indicating that the callee accepted the call, taking the caller from the CCST.sub.-- CALLING call state to the CCST.sub.-- CONNECTED call state.
Both the caller and callee then call the CF.sub.-- PlayRcvd function to begin reception and play of the video and audio streams from the remote site, leaving all states unchanged. Both the caller and callee call the CF.sub.-- SendStream function to start sending the locally captured video and audio streams to the remote site, leaving all states unchanged. If necessary, both the caller and callee may then call the CF.sub.-- Control function to control the remote video and audio streams, again leaving all states unchanged. The conferencing session then proceeds with no changes to the call and stream states. During the conferencing session, the application may call CF.sub.-- Mute, CF.sub.-- PlayStream, or CF SendStream. These affect the state of the streams in the audio/video managers, but not the state of the stream group.
When the conferencing session is to be terminated, the caller calls the CF.sub.-- HangupCall function to end the conferencing session, taking the caller from the CCST.sub.-- CONNECTED call state to the CCST.sub.-- IDLE call state. The callee receives and processes a CFM.sub.-- HANGUP.sub.-- NTFY message from the caller indicating that the caller has hung up, taking the callee from the CCST.sub.-- CONNECTED call state to the CCST.sub.-- IDLE call state.
Both the caller and callee call the CF.sub.-- Destroy function to stop playing the remote video and audio signals, taking both the caller and callee remote streams from the CSST.sub.-- ACTIVE stream state to the CSST.sub.-- INIT stream state. Both the caller and callee also call the CF.sub.-- Destroy function to stop capturing the local video and audio signals, taking both the caller and callee local streams from the CSST.sub.-- ACTIVE stream state to the CSST.sub.-- INIT stream state.
This described scenario is just one possible scenario. Those skilled in the art will understand that other scenarios may be constructed using the following additional functions and state transitions:
If the callee does not answer within a specified time period, the caller automatically calls the CF.sub.-- HangupCall function to hang up, taking the caller from the CCST.sub.-- CALLING call state to the CCST.sub.-- IDLE call state.
The callee calls the CF.sub.-- RejectCall function to reject a call from the caller, taking the callee from the CCST.sub.-- CALLED call state to the CCST.sub.-- IDLE call state. The caller then receives and processes a CFM.sub.-- REJECT.sub.-- NTFY message indicating that the callee has rejected the caller's call, taking the caller from the CCST.sub.-- CALLING call state to the CCST.sub.-- IDLE call state.
The callee (rather than the caller) calls the CF.sub.-- HangupCall function to hang up, taking the callee from the CCST.sub.-- CONNECTED call state to the CCST.sub.-- IDLE call state. The caller receives a CFM.sub.-- HANGUP.sub.-- NTFY message from the callee indicating that the callee has hung up, taking the caller from the CCST.sub.-- CONNECTED call state to the CCST.sub.-- IDLE call state.
The CF.sub.-- GetCallState function may be called by either the caller or the callee from any call state to determine the current call state without changing the call state.
During a conferencing session, an unrecoverable resource failure may occur in the local stream of either the caller or the callee causing the local stream to be lost, taking the local stream from the CSST.sub.-- ACTIVE stream state to the CSST.sub.-- FAILURE stream state. Similarly, an unrecoverable resource failure may occur in the remote stream of either the caller or the callee causing the remote stream to be lost, taking the remote stream from the CSST.sub.-- ACTIVE stream state to the CSST.sub.-- FAILURE stream state. In either case, the local site calls the CF.sub.-- Destroy function to recover from the failure, taking the failed stream from the CSST.sub.-- FAILURE stream state to the CSST.sub.-- INIT stream state.
The CF.sub.-- GetStreamlnfo function may be called by the application from any stream state of either the local stream or the remote stream to determine information regarding the specified stream groups. The CF.sub.-- SnapStream and CF.sub.-- RecordStream functions may be called by the application for the local stream in the CSST.sub.-- ACTIVE stream state or for the remote stream (CF.sub.-- RecordStream only) in the CSST.sub.-- ACTIVE stream state. All of the functions described in this paragraph leave the stream state unchanged.
Video Subsystem
The video subsystem of conferencing system 100 of FIG. 5 comprises video API 508, video manager 516, video capture driver 522, and host interface 526 running on host processor 202 of FIG. 2 and video microcode 530 running on video board 204. The following sections describe each of these constituents of the video subsystem.
Video API
Video API 508 of FIG. 5 provides an interface between audio/video conferencing application 502 and the video subsystem. Video API 508 provides the following services:
______________________________________ Capture Service Captures a single video stream continuously from a local video hardware source, for example, a video camera or VCR, and directs the video stream to a video software output sink (i.e., a network destination). Monitor Service Monitors the video stream being captured from the local video hardware in the local video window previously opened by the application. Note: This function intercepts and displays a video stream at the hardware board when the stream is first captured. This operation is similar to a "Short circuit" or a UNIX tee and is different from the "play" function. The play function gets and displays the video stream at the host. In conferencing system 100, the distinction between monitor and play services is that one is on the board and the other at the host. Both are carried out on the host (i.e., software playback). Rather, the distinction is this: monitor service intercepts and displays, on the local system, a video stream that has been captured with the local hardware (generated locally). By contrast, play service operates on a video stream that has been captured on a remote system's hardware and then sent to the local system (generated remotely). Pause Service Suspends capturing or playing of an active video stream; resumes capturing or playing of a previously suspended video stream. Image Capture Grabs the most current complete still image (called a reference frame) from the specified video stream and returns it to the application in the Microsoft .RTM. DIB (Device-Independent Bitmap) format. Play Service Plays a video stream continuously by consuming the video frames from a video software source (i.e., a network source). Link-In Service Links a video network source to be the input of a video stream played locally. This service allows applications to change dynamically the software input source of a video stream. Link-Out Service Links a network source to be the output of a video stream captured locally. This service allows applications to change dynamically the software output source of a video stream. Control Service Controls the video stream "on the fly," including adjusting brightness, contrast, frame rate, and data rate. Information Service Returns status and information about a specified video stream. Initialization/ Initializes the video subsystem and Configuration calculates the cost, in terms of system resources, required to sustain certain video configurations. These costs can be used by other subsystems to determine the optimum product configuration for the given system. ______________________________________
Video API 508 supports the following function calls by audio/video conferencing application 502 to the video subsystem:
______________________________________ VOpen Opens a video stream with specified attributes by allocating all necessary system resources (e.g., internal data structures) for it. VCapture Starts/stops capturing a video stream from a local video hardware source, such as a video camera or VCR. VMonitor Starts/stops monitoring a video stream captured from local a video camera or VCR. VPlay Starts/stops playing a video stream from a network, or remote, video source. When starting to play, the video frames are consumed from a network video source and displayed in a window pre-opened by the application. VLinkIn Links/unlinks a network . . . to/from a specified video stream, which will be played/is being played locally. VLinkOut Links/unlinks a network . . . to/from a specified video stream, which will be captured/is being captured from the local camera or VCR. VGrabframe Grabs the most current still image (reference frame) from a specified video stream and returns the frame in an application-provided buffer. VPause Starts/stops pausing a video stream captured/ played locally. VCntl Controls a video stream by adjusting its parameters (e.g., tint/contrast, frame/data rate). VGetInfo Returns the status (VINFO and state) of a video stream. VClose Closes a video stream and releases all system resources allocated for this stream. VInit Initializes the video subsystem, starts capture and playback applications, and calculates system utilization for video configurations. VShutdown Shuts down the video subsystem and stops the capture and playback applications. VCost Calculates and reports the percentage CPU utilization required to support a given video stream. ______________________________________
These functions are defined in further detail later in this specification in a section entitled "Data Structures, Functions, and Messages."
Referring now to FIG. 10, there is shown a representation of the video FSM for the local video stream and the remote video stream of a conferencing system during a conferencing session, according to a preferred embodiment of the present invention. The possible video states are as follows:
______________________________________ VST.sub.-- INIT Initial state -- state of local and remote video streams after the application calls the CF.sub.-- Init function. VST.sub.-- OPEN Open state -- state of the local/remote video stream after system resources have been allocated. VST.sub.-- CAPTURE Capture state -- state of local video stream being captured. VST.sub.-- LINKOUT Link-out state -- state of local video stream being linked to video output (e.g., network output channel or output file). VST.sub.-- LINKIN Link-in state -- state of remote video stream being linked to video input (e.g., network input channel or input file). VST.sub.-- PLAY Play state -- state of remote video stream being played. VST.sub.-- ERROR Error state -- state of local/remote video stream after a system resource failure occurs. ______________________________________
In a typical conferencing session between a caller and a callee, both the local and remote video streams begin in the VST.sub.-- INIT video state of FIG. 10. The application calls the VOpen function to open the local video stream, taking the local video stream from the VST.sub.-- INIT video state to the VST.sub.-- OPEN video state. The application then calls the VCapture function to begin capturing the local video stream, taking the local video stream from the VST.sub.-- OPEN video state to the VST.sub.-- CAPTURE video state. The application then calls the VLinkOut function to link the local video stream to the video output channel, taking the local video stream from the VST.sub.-- CAPTURE video state to the VST.sub.-- LINKOUT video state.
The application calls the VOpen function to open the remote video stream, taking the remote video stream from the VST.sub.-- INIT video state to the VST.sub.-- OPEN video state. The application then calls the VLinkIn function to link the remote video stream to the video input channel, taking the remote video stream from the VST.sub.-- OPEN video state to the VST.sub.-- LINKIN video state. The application then calls the VPlay function to begin playing the remote video stream, taking the remote video stream from the VST.sub.-- LINKIN video state to the VST.sub.-- PLAY video state. The conferencing session proceeds without changing the video states of either the local or remote video stream.
When the conferencing session is to be terminated, the application calls the VClose function to close the remote video channel, taking the remote video stream from the VST.sub.-- PLAY video state to the VST.sub.-- INIT video state. The application also calls the VClose function to close the local video channel, taking the local video stream from the VST.sub.-- LINKOUT video state to the VST.sub.-- INIT video state.
This described scenario is just one possible video scenario. Those skilled in the art will understand that other scenarios may be constructed using the following additional functions and state transitions:
The application calls the VLinkOut function to unlink the local video stream from the video output channel, taking the local video stream from the VST.sub.-- LINKOUT video state to the VST.sub.-- CAPTURE video state.
The application calls the VCapture function to stop capturing the local video stream, taking the local video stream from the VST.sub.-- CAPTURE video state to the VST.sub.-- OPEN video state.
The application calls the VClose function to close the local video stream, taking the local video stream from the VST.sub.-- OPEN video state to the VST.sub.-- INIT video state.
The application calls the VClose function to close the local video stream, taking the local video stream from the VST.sub.-- CAPTURE video state to the VST.sub.-- INIT video state.
The application calls the VClose function to recover from a system resource failure, taking the local video stream from the VST.sub.-- ERROR video state to the VST.sub.-- INIT video state.
The application calls the VPlay function to stop playing the remote video stream, taking the remote video stream from the VST.sub.-- PLAY video state to the VST.sub.-- LINKIN video state.
The application calls the VLinkIn function to unlink the remote video stream from the video input channel, taking the remote video stream from the VST.sub.-- LINKIN video state to the VST.sub.-- OPEN video state.
The application calls the VClose function to close the remote video stream, taking the remote video stream from the VST.sub.-- OPEN video state to the VST.sub.-- INIT video state.
The application calls the VClose function to close the remote video stream, taking the remote video stream from the VST.sub.-- LINKIN video state to the VST.sub.-- INIT video state.
The application calls the VClose function to recover from a system resource failure, taking the remote video stream from the VST.sub.-- ERROR video state to the VST.sub.-- INIT video state.
The VGetInfo and VCntl functions may be called by the application from any video state of either the local or remote video stream, except for the VST.sub.-- INIT state. The VPause and VGrabFrame functions may be called by the application for the local video stream from either the VST.sub.-- CAPTURE or VST.sub.-- LINKOUT video states or for the remote video stream from the VST.sub.-- PLAY video state. The VMonitor function may be called by the application for the local video stream from either the VST.sub.-- CAPTURE or VST.sub.-- LINKOUT video states. All of the functions described in this paragraph leave the video state unchanged.
Video Manager
Referring now to FIG. 11, there is shown a block diagram of the software components of video manager (VM) 516 of FIG. 5, according to a preferred embodiment of the present invention. Video manager 516 is implemented using five major components:
______________________________________ Library (VM DLL 1102) A Microsoft .RTM. Windows Dynamic Link Library (DLL) that provides the library of functions of video API 508. Capture (VCapt EXE 1104) A Microsoft .RTM. Windows application (independently executable control thread with stack, message queue, and data) which controls the capture and distribution of video frames from video board 204. Playback (VPlay EXE 1106) A Microsoft .RTM. Windows application which controls the playback (i.e., decode and display) of video frames received from either the network or a co-resident capture application. Network Library (Netw DLL 1108) A Microsoft .RTM. Windows DLL which provides interfaces to send and receive video frames across a network or in a local loopback path to a co-resident playback application. The Netw DLL hides details of the underlying network support from the capture and playback applications and implements (in a manner hidden from those applications) the local loopback function. Audio-Video (AVSync DLL 1110) A Microsoft .RTM. Windows Synchronization DLL which provides interfaces to enable the Library synchronization of video frames with a separate stream of audio frames for the purposes of achieving "lip-synchronization." AVSync DLL 1110 supports the implementation of an audio-video synchronization technique described later in this specification. ______________________________________
The five major components, and their interactions, define how the VM implementation is decomposed for the purposes of an implementation. In addition, five techniques provide full realization of the implementation:
______________________________________ Stream Restart A technique for initially starting, and restarting, a video stream. If a video stream consists entirely of encoded "delta" frames, then the method of stream start/restart quickly supplies the decoder with a "key" or reference frame. Stream restart is used when a video stream becomes out-of-sync with respect to the audio. Synchronization An audio-video synchronization technique for synchronizing a sequence, or stream, of video frames with an external audio source. Bit Rate Throttling A technique by which the video stream bit rate is controlled so that video frame data co-exists with other video conferencing components. This technique is dynamic in nature and acts to "throttle" the video stream (up and down) in response to higher priority requests (higher than video data priority) made at the network interface. Multiple Video A technique by which multiple video Formats formats are used to optimize transfer, decode, and display costs when video frames are moved between video board 204 and host processor 202. This technique balances video frame data transfer overhead with host processor decode and display overhead in order to implement efficiently a local video monitor. Self-Calibration A self-calibration technique which is used to determine the amount of motion video PC system can support. This allows conferencing system 100 to vary video decode and display configurations in order to run on a range of PC systems. It is particularly applicable in software- playback systems. ______________________________________
Capture/Playback Video Effects
This sub section describes an important feature of the VM implementation that has an impact on the implementation of both the capture and playback applications (VCapt EXE 1104 and VPlay EXE 1106). One of the key goals of VM capture and playback is that while local Microsoft.RTM. Windows application activity may impact local video playback, it need not effect remote video playback. That is, due to the non-preemptive nature of the Microsoft.RTM. Windows environment, the VPlay application may not get control to run, and as such, local monitor and remote playback will be halted. However, if captured frames are delivered as a part of capture hardware interrupt handling, and network interfaces are accessible at interrupt time, then captured video frames can be transmitted on the network, regardless of local conditions.
With respect to conferencing system 100, both of these conditions are satisfied. This is an important feature in an end-to-end conferencing situation, where the local endpoint is unaware of remote endpoint processing, and can only explain local playback starvation as a result of local activity. The preferred capture and playback application design ensures that remote video is not lost due to remote endpoint activity.
Video Stream Restart
The preferred video compression method for conferencing system 100 (i.e., ISDN rate video or IRV) contains no key frames (i.e., reference frames). Every frame is a delta (i.e., difference) frame based on the preceding decoded video frame. In order to establish a complete video image, IRV dedicates a small part (preferably 1/85th) of each delta frame to key frame data. The part of an IRV delta frame that is key is complete and does not require inter-frame decode. The position of the key information is relative, and is said to "walk" with respect to a delta frame sequence, so that the use of partial key information may be referred to as the "walking key frame."
Referring now to FIG. 12, there is shown a representation of a sequence of N walking key frames. For a walking key frame of size i/N, the kth frame in a sequence of N frames, where (k<=N), has its kth component consisting of key information. On decode, that kth component is complete and accurate. Provided frame k+1 is decoded correctly, the kth component of the video stream will remain accurate, since it is based on a kth key component and a k+1 correct decode. A complete key frame is generated every N flames in order to provide the decoder with up-to-date reference information within N flames.
For a continuous and uninterrupted stream of video frames, the walking key frame provides key information without bit-rate fluctuations that would occur if a complete key frame were sent at regular intervals. However, without a complete key frame, video startup requires collecting all walking key frame components, which requires a delay of N flames. If video startup/restart occurs often, this can be problematic, especially if N is large. For example, at 10 flames per second (fps) with N=85, the startup/restart time to build video from scratch is 8.5 seconds.
In order to accelerate IRV stream startup and restart, an IRV capture driver "Request Key Frame" interface is used to generate a complete key frame on demand. The complete key frame "compresses" N flames of walking key flames into a single frame, and allows immediate stream startup once it is received and decoded. Compressed IRV key flames for (160.times.120) video images are approximately 6-8 KBytes in length. Assuming an ISDN bandwidth of 90 kbits dedicated to video, ISDN key frame transmission takes approximately 0.5-0.6 seconds to transmit. Given a walking key frame size of 1/85 (N=85), and a flame rate of 10 fps, use of a complete key flame to start/restart a video stream can decrease the startup delay from 8.5 secs to approximately 1/2 sec.
In order for walking key frame compression to be successful, the delta frame rate must be lowered during key frame transmission. Delta flames generated during key frame transmission are likely to be "out-of-sync" with respect to establishing audio-video synchronization, and given the size of a key frame, too many delta flames will exceed the overall ISDN bandwidth. The IRV capture driver bit rate controller takes into account key frame data in its frame generation logic and decreases frame rate immediately following a key frame.
A key frame once received may be "out-of-sync" with respect to the audio stream due to its lengthy transmission time. Thus, key frames will be decoded but not displayed, and the video stream will be "in-sync" only when the first follow-on delta frame is received. In addition, the "way-out-of-sync" window is preferably sized appropriately so that key frame transmission does not cause the stream to require repeated restarts.
Once it is determined that a stream requires restart, either as part of call establishment or due to synchronization problems, the local endpoint requiring the restart transmits a restart control message to the remote capture endpoint requesting a key frame. The remote capture site responds by requesting its capture driver to generate a key frame. The key frame is sent to the local endpoint when generated. The endpoint requesting the restart sets a timer immediately following the restart request. If a key frame is not received after an adequate delay, the restart request is repeated.
Audio/Video Synchronization
Video manager 516 is responsible for synchronizing the video stream with the audio stream in order to achieve "lip-synchronization." Because of the overall conferencing architecture, the audio and video subsystems do not share a common clock. In addition, again because of system design, the audio stream is a more reliable, lower latency stream than the video stream. For these reasons, the video stream is synchronized by relying on information regarding capture and playback audio timing.
For VM audio/video (A/V) synchronization, audio stream packets are timestamped from an external clock at the time they are captured. When an audio packet is played, its timestamp represents the current audio playback time. Every video frame captured is stamped with a timestamp, derived from the audio system, that is the capture timestamp of the last audio packet captured. At the time of video playback (decode and display, typically at the remote endpoint of a video conference), the video frame timestamp is compared with the current audio playback time, as derived from the audio system.
Two windows, or time periods, .delta..sub.1 and .delta..sub.2, are defined, with .delta..sub.1 <.delta..sub.2, as part of VM initialization. Let V.sub.T be the timestamp for a given video frame, and let A.sub.T be the current audio playback time when the video frame is to be played.. A/V synchronization is defined as follows:
1. If .vertline.A.sub.T -V.sub.T .vertline..ltoreq..delta..sub.1, then the video stream is "in-sync" and played normally (i.e., decoded and displayed immediately).
2. If .delta..sub.1 <.vertline.A.sub.T -V.sub.T .vertline..ltoreq..delta..sub.2, then the video stream is "out-of-sync" and a "hurry-up" technique is used to attempt re-synchronization. If a video stream remains out-of-sync for too many consecutive frames, then it becomes "way-out-of-sync" and requires a restart.
3. If .delta..sub.2 <.vertline.A.sub.T -V.sub.T .vertline., then the video stream is "way-out-of-sync" and requires a restart.
Because of the overall design of conferencing system 100, a video stream sent from one endpoint to another is "behind" its corresponding audio stream. That is, the transmission and reception of a video frame takes longer than the transmission and reception of an audio frame. This is due to the design of video and audio capture and playback sites relative to the network interface, as well as video and audio frame size differences. In order to compensate for this, the audio system allows capture and playback latencies to be set for an audio stream. Audio capture and playback latencies artificially delay the capture and playback of an audio stream.
As part of the VLinkOut function, video manager 516 calls audio manager 520 to set an audio capture latency. As part of the VLinkIn function, video manager 516 calls audio manager 520 to set an audio playback latency. Once the latencies are set, they are preferably not changed. The capture and playback latency values are specified in milliseconds, and defined as part of VM initialization. They may be adjusted as part of the Calibration process.
In order to attempt re-synchronization when a stream is not too far "out-of-sync" as defined by the above rules, an feature called "Hurry-up" is used. When passing a video frame to the codec for decode, if hurry-up is specified, then the codec performs frame decode to a YUV intermediate format but does not execute the YUV-to-RGB color conversion. Though the output is not color converted for RGB graphics display, the hurry-up maintains the playback decode stream for following frames. When Hurry-up is used, the frame is not displayed. By decreasing the decode/display cost per frame and processing frames on demand (the number of frames processed for playback per second can vary), it is possible for a video stream that is out-of-sync to become in-sync.
Bit Rate Throttling
Conferencing system 100 supports a number of different media: audio, video, and data. These media are prioritized in order to share the limited network (e.g., ISDN) bandwidth. A priority order of (highest-to-lowest) audio, data, and video is designated. In this scheme, network bandwidth that is used for video will need to give way to data, when data conferencing is active (audio is not compromised). In order to implement the priority design, a mechanism for dynamically throttling the video bit stream is used. It is a self-throttling system, in that it does not require input from a centralized bit rate controller. It both throttles down and throttles up a video bit stream as a function of available network bandwidth.
A latency is a period of time needed to complete the transfer of a given amount of data at a given bit rate. For example, for 10 kbits at 10 kbits/sec, latency=1. A throttle down latency is the latency at which a bit stream is throttled down (i.e., its rate is lowered), and a throttle up latency is the latency at which a bit stream is throttled up (i.e., its rate is increased).
Multiple Video Formats
Conferencing system 100 presents both a local monitor display and a remote playback display to the user. A digital video resolution of (160.times.120) is preferably used as capture resolution for ISDN-based video conferencing (i.e., the resolution of a coded compressed video stream to a remote site). (160.times.120) and (320.times.240) are preferably used as the local monitor display resolution. (320.times.240) resolution may also be used for high-resolution still images. Generating the local monitor display by decompressing and color converting the compressed video stream would be computationally expensive. The video capture driver 522 of FIG. 5 simultaneously generates both a compressed video stream and an uncompressed video stream. Video manager 516 makes use of the uncompressed video stream to generate the local monitor display. Video manager 516 may select the format of the uncompressed video stream to be either YUV-9 or 8-bits/pixel (bpp) RGB--Device Independent Bitmap (DIB) format. For a (160.times.120) local monitor, the uncompressed DIB video stream may be displayed directly. For a (320.times.240) monitor, a (160.times.120) YUV-9 format is used and the display driver "doubles" the image size to (320.times.240) as part of the color conversion process.
In the RGB and YUV-9 capture modes, RGB or YUV data are appended to capture driver IRV buffers, so that the capture application (VCapt EXE 1104) has access to both fully encoded IRV frames and either RGB or YUV data. Conferencing system 100 has custom capture driver interfaces to select either RGB capture mode, YUV capture mode, or neither.
Self-Calibration
CPU, I/O bus, and display adapter characteristics vary widely from computer to computer. The goal of VM self-calibration is to support software-based video playback on a variety of PC platforms, without having to "hard-code" fixed system parameters based on knowledge of the host PC. VM self-calibration measures a PC computer system in order to determine the decode and display overheads that it can support. VM self-calibration also offers a cost function that upper-layer software may use to determine if selected display options, for a given video compression format, are supported.
There are three major elements to the self-calibration:
1. The calibration of software decode using actual video decompress cycles to measure decompression costs. Both RGB/YUV capture mode and IRV frames are decoded in order to provide accurate measurement of local (monitor) and remote video decode. YUV (160.times.120) and YUV (320.times.240) formats are also decoded (color converted) to provide costs associated with the YUV preview feature of the video subsystem.
2. A calibration of PC displays, at varying resolutions, using actual video display cycles to measure display costs.
3. A video cost function, available to applications, that takes as input frame rate, display rate, display resolution, video format, and miscellaneous video stream characteristics, and outputs a system utilization percentage representing the total system cost for supporting a video decompress and display having the specified characteristics.
The calibration software detects a CPU upgrade or display driver modification in order to determine if calibration is to be run, prior to an initial run on a newly installed system.
VM DLL
Referring again to FIG. 11, video manager dynamic link library (VM DLL) WB is a video stream "object manager." That is, with few exceptions, all VM DLL interfaces take a "Video Stream Object Handle" (HVSTRM) as input, and the interfaces define a set of operations or functions on a stream object. Multiple stream objects may be created.
Video API 508 defines all of external interfaces to VM DLL WB. There are also a number of VM internal interfaces to VM DLL WB that are used by VCapt EXE WC, VPlay EXE WD, Netw DLL WE, and AVSync DLL WF for the purposes of manipulating a video stream at a lower level than that available to applications. The vm.h file, provided to applications that use VM DLL WF, contains a definition of all EPS and VM internal interfaces. EPS interfaces are prefixed with a `V`; VM internal interfaces are prefixed with a `VM`. Finally, there are a number of VM private interfaces, available only to the VM DLL code, used to implement the object functions. For example, there are stream object validation routines. The self-calibration code is a separate module linked with the VM DLL code proper.
Video API calls, following HVSTRM and parameter validation, are typically passed down to either VCapt or VPlay for processing. This is implemented using the Microsoft.RTM. Windows SDK SendMessage interface. SendMessage takes as input the window handle of the target application and synchronously calls the main window proc of that application. As part of VM initialization, VM starts execution of the applications, VCapt and VPlay. As part of their WinMain processing, these applications make use of a VMRegister interface to return their window handle to VM DLL WB. From registered window handles, VM DLL WB is able to make use of the SendMessage interface. For every video API interface, there is a corresponding parameter block structure used to pass parameters to VCapt or VPlay. These structures are defined in the vm.h file. In addition to the WinExec startup and video API interface calls, VM DLL WB can also send a shutdown message to VCapt and VPlay for termination processing.
Immediately following the successful initialization of VCapt and VPlay, VM 516 calls the interface `videoMeasure` in order to run self-calibration. The VCost interface is available, at run-time, to return measurement information, per video stream, to applications.
VCapt EXE
The video capture application (VCapt EXE WC) implements all details of video frame capture and distribution to the network, including:
Control of the video board capture driver.
Video format handling to support IRV and RGB/YUV capture mode.
Video frame capture callback processing of captured video frames.
Copy followed by PostMessage transfer of video frames to local playback application (VPlay EXE).
Transmission, via Netw DLL WE, of video frames to the network.
Mirror, zoom, camera video attributes, and miscellaneous capture stream control processing.
Restart requests from a remote endpoint.
Shutdown processing.
VCapt EXE WC processing may be summarized as a function of the Microsoft.RTM. Windows messages as follows:
WINMAIN
Initialize application.
Get VCapt EXE initialization (INI) settings.
Open video board capture driver.
Register window handle (and status) with VM DLL WB.
Enter Microsoft.RTM. Windows message loop.
WM.sub.-- VCAPTURE.sub.-- CALL (ON)
Register audio callback with audio manager 520.
Set audio capture latency with audio manager 520.
Initialize the video board capture stream based on stream object attributes.
WM.sub.-- VLINKOUT.sub.-- CALL (ON)
Register Netw callback handler for transmission completion handling.
Initialize bit rate throttling parameters.
WM.sub.-- MONITOR.sub.-- DATA.sub.-- RTN
Decrement reference count on video frame (user context buffers).
WM.sub.-- PLAY.sub.-- DATA.sub.-- RTN
Add buffer back to capture driver.
This message is only in loopback cas