United States Patent Application20030093790
Kind CodeA1
Logan, James D. ; et al.May 15, 2003

Audio and video program recording, editing and playback systems using metadata
Abstract
A system for utilizing metadata created either at a central location for shared use by connected users, or at each individual user's location, to enhance user's enjoyment of available broadcast programming content. A variety of mechanisms are employed for automatically and manually identifying and designating programming segments, associating descriptive metadata which the identified segments, distributing the metadata for use at client locations, and using the supplied metadata to selectively record and playback desired programming.

Inventors:Logan; James D. (Windham, NH), Durgin; Scott A.  (North Andover, MA), Doe; Brian D.  (Windham, NH), Colella; Vincent E.  (Wilmington, MA), Hale; McFarland  (North Chelmsford, MA), Mansfield; Paul M.  (Burlington, MA), Read; Gregory J.  (Newbury, MA), Santos; Jeffrey M.  (Newburyport, MA), Palone; Michael G.  (Acton, MA), Boone; Stephen  (Windham, NH)
Correspondence Name and Address:68 HORSE POND ROAD
CHARLES G. CALL
WEST YARMOUTH
MA
02673-2516
US
Series Code:165587
Filed:June 8, 2002
U.S. Current Class:725/38; 725/134; 725/142; 725/61; 345/783; 345/845
U.S. Class at Publication:725/38; 725/134; 725/142; 725/61; 345/783; 345/845
Intern'l Class:G06F 003/00; H04N 005/445; G06F 013/00; H04N 007/173; H04N 007/16; G09G 005/00

Claims


What is claimed is:
1. A method for selectively reproducing recorded video program segments comprising the steps of: storing said video program segments in a mass storage device, storing playlist metadata specifying a selected set of said stored segments and the ordered sequence in which the segments in said set are to be reproduced on a display device in the absence of an intervening control command from the viewer, said playlist metadata further including a text description of each segment in said set, reproducing said set of stored segments on said display device in the ordered sequence specified by said playlist, at the request of a viewer, displaying a segment guide listing on said display device, said guide listing containing the text description of each segment in said set with the text description of the segment currently being reproduced being visually identified on said guide listing, visually identifying a different selected one of said text descriptions on said guide listing in response to a selection command from a viewer, and reproducing the video segment described by said selected one of said text descriptions in response to further command from said viewer.

Description



CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of the filing date of the following co-pending applications: U.S. Provisional Patent Application Serial No. 60/297,204 filed on Jun. 8, 2001 entitled "Methods and Apparatus for Navigating Time-Shifted Television Programming;" U.S. Provisional Patent Application Serial No. 60/352,788 filed Nov. 28, 2001
entitled "Methods and Apparatus for Distributing Segmented Television Programming;" U.S. Utility patent application Ser. No. 09/536,969 filed on Mar. 28, 2000 entitled "Systems and Methods for Modifying Broadcast Programming;" U.S. Provisional Application Serial No. 60/304,570 filed on Jul. 11, 2001 entitled "Audio and Video Program Recording, Editing and Playback Systems using Metadata;" U.S. Provisional Application Serial No. 60/336,602 filed on Dec. 3, 2001 entitled "Methods and Apparatus for Automatically Bookmarking Programming Content;" and U.S. Utility application Ser. No. 10/060,001 filed on Jan. 29, 2002 entitled "Audio and Video Program Recording, Editing and Playback Systems Using Metadata." The disclosure of each of the foregoing applications is hereby incorporated herein by reference.

REFERENCE TO COMPUTER PROGRAM LISTING APPENDIX

[0002] A computer program listing appendix is stored on each of two duplicate compact disks which accompany this specification. Each disk contains computer program listings that illustrate implementations of the invention. The listings are recorded as ASCII text in IBM PC/MS DOS compatible files which have the names, sizes (in bytes) and creation dates listed below:

1
Size (bytes) Date created Filename 3,123 May 21, 2001 12:08 p ToolBarDlg.h 1,141 May 21, 2001 12:08 p CAnimate.h 65,419 Jun. 05, 2001 11:13 p CNPDlg.cpp 5,228 Jun. 04, 2001 10:20 a CNPDlg.h 33,964 Jun. 04, 2001
10:16 a ConceptDlg.cpp 6,337 Jun. 01, 2001 10:30 a ConceptDlg.h 19,459 Jun. 04, 2001 5:37 p CPLDlg.cpp 3,553 May 23, 2001
4:24 a CPLDlg.h 3,758 May 21, 2001 12:08 p cpp 26,672
Jun. 04, 2001 5:39 p CSegmentDriver.cpp 6,317 May 29, 2001 2:04
p CSegmentDriver.h 29,715 May 21, 2001 12:08 p dxmplayer.cpp 7,527 May 21, 2001 12:08 p dxmplayer.h 1,202 May 21, 2001
12:08 p DXMPLayerConstants.h 5,258 May 21, 2001 12:08 p DXMPlayerEventSink.cpp 2,239 May 21, 2001 12:08 p DXMPlayerEventSink.h 7,753 Jun. 01, 2001 10:28 a GClient.clw 3,259 May 24, 2001 12:14 p GClient.cpp 5,771 May 21, 2001
12:08 p GClient.dep 13,602 Jun. 04, 2001 10:21 a GClient.dsp 537 May 21, 2001 12:08 p GClient.dsw 10,333 Jun. 05, 2001
1:09 p GClient.h 16,693 May 21, 2001 12:08 p GClient.mak 541,696 Jun. 01, 2001 10:24 a GClient.ncb 79,872 Jun. 01, 2001
10:23 a GClient.opt 1,704 Jun. 04, 2001 10:26 a GClient.plg 35,194 Jun. 05, 2001 10:46 p GClient.rc 4,127 May 21, 2001
12:08 p GClientDlg.cpp 1,353 May 21, 2001 12:08 p GClientDlg.h 12,074 May 21, 2001 12:08 p gmanager.cpp 3,404 May 21, 2001
12:08 p gmanager.h 7,706 May 23, 2001 4:19 a IMediaPlayer.h 20,350 Jun. 04, 2001 3:56 p InfoDlg.cpp 3,187 May 24, 2001
12:17 p InfoDlg.h 8,988 May 21, 2001 12:08 p ISegmentDriver.h 3,020 May 28, 2001 6:52 a ISegmentListCtrl.h 10,152 May 23, 2001 4:22 a MediaPlayer.cpp 1,894 May 23, 2001 4:19 a MediaPlayer.h 5,982 Jun. 04, 2001 10:13 a metadata.cpp 33,088 Jun. 05, 2001 1:13 p MetaDataDlg.cpp 3,754 Jun. 01, 2001
10:21 a MetaDataDlg.h 13,369 Jun. 05, 2001 1:31 a NotAuthored.cpp 3,292 May 24, 2001 6:31 a NotAuthored.h 11,161 Jun. 05, 2001 3:53 p playlist.cpp 5,676 May 24, 2001
4:20 a ReadMe.txt 13,465 Jun. 04, 2001 10:19 a Resource.h 3,001 May 21, 2001 12:08 p resource.h.ejb 2,622 May 21, 2001
12:08 p resource.h.old 6,716 Jun. 04, 2001 3:53 p segment.cpp 3,608 May 31, 2001 1:54 p segment.h 1,227 May 21, 2001
12:08 p segmentList.cpp 1,598 May 21, 2001 12:08 p segmentList.h 42,155 Jun. 05, 2001 1:13 p SegmentListCtrl.cpp 8,025
May 28, 2001 6:52 a SegmentListCtrl.h 21,934 May 21, 2001 12:08
p segmentlistctrl.sav 19,437 Jun. 05, 2001 10:02 a SettingsDlg.cpp 3,086 May 28, 2001 6:56 a SettingsDlg.h 209 May 21, 2001 12:08 p StdAfx.cpp 1,195 May 21, 2001 12:08 p StdAfx.h 8,653 Jun. 05, 2001 1:33 a ThankYouDlg.cpp 2,288 Jun. 04, 2001 10:08 a ThankYouDlg.h 11,679 Jun. 04, 2001
10:25 a ToolBarDlg.cpp 662 May 21, 2001 12:07 p CAnimate.cpp 5,993 Apr. 19, 2001 3:39 p TestAccess.java 3,060 Mar. 27, 2001 5:16 p ER.TVP 2,043 Apr. 19, 2001 3:55 p GDBPool.java.txt 7,307 Apr. 23, 2001 12:16 p GGUPIServlet.java 1,472 Apr. 27, 2001 5:23 p GPlayListSet.java.txt 1,189 Apr. 27, 2001 5:26
p GPlayListSetFromCache.java.txt 1,207 Apr. 27, 2001 5:28 p GPlayListSetFromDatabase.java.txt 1,187 Apr. 27, 2001 5:27 p GPlayListSetFromFile.java.txt 7,337 Apr. 23, 2001 12:08 p GPostServlet.java 3,592 Apr. 27, 2001 5:16 p GQuery.java.txt 7,308 Apr. 23, 2001 12:16 p GQueryServlet.java 1,643 Apr. 19, 2001 5:05 p GService.java.txt 825 Apr. 12, 2001 12:24 a GTest.java 1,132 Apr. 20, 2001 2:00 p JDBCConfig.properties 5,604 Apr. 23, 2001 11:51 a SnoopServlet.java 5,874 Apr. 19, 2001 12:10 a Copy of TestAccess.java.txt 935 May 29, 2001 1:01
p tvp.php 358 May 25, 2001 7:30 a frames.html 12,051
Jun. 05, 2001 3:25 p gspot.php 919 May 22, 2001 6:37 a movie2.php 636 May 22, 2001 6:37 a player.js 4,546
Jun. 01, 2001 4:23 p sreplace.php 983 May 31, 2001 6:04 a file.ph

COPYRIGHT STATEMENT

[0003] A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

FIELD OF THE INVENTION

[0004] This invention relates to audio and video program reception, storage, editing, recording and playback systems and more particularly to methods and apparatus for distributing, recording, organizing and editing metadata that is used to selectively distribute, record, organize, edit and play program content.

BACKGROUND OF THE INVENTION

[0005] Historically, the viewing experience of TV has been governed by the content and programming of the service providers, broadcasters and networks who decide when programs will be available and their duration. While lifestyles have become more complex and the content available to the viewer has increased, it has become more desirable to allow viewers to control this form of entertainment on their own terms. While video cassette recorders (VCRs) allow viewers to capture content for future playback, the VCR has been plagued with limitations inherent in the analog tape media and the difficulty viewers commonly experience in programming these devices to record selected future programs.

[0006] The recent advent of digital video recorders (DVRs), coupled with more intuitive electronic program guides (EPGs) used in popular DVRs, have provided new and simplified recording options for viewers. In addition, as a useful byproduct of digital storage, DVRs provide the ability to pause, replay and fast-forward the playback of time-shifted programming. However, as the number of available channels and the volume and diversity of available content increases, currently available DVRs and program guides will not provide the needed ability to playback and scan volumes of stored video with simple controls and with minimal knowledge of the available content. Viewers will need "more information about the content to help them navigate between programs and within a particular program.

SUMMARY OF THE INVENTION

[0007] In a principle aspect, the present invention takes the form of methods and apparatus for selectively reproducing recorded video program segments retrieved from a mass storage device under the control of playlist metadata which identifies a selected set of the stored segments ant the ordered sequence in which those segments are to be reproduced in the absence of an intervening control command from the viewer. The playlist metadata includes a text description of each segment in the sequence. In response to a request from the viewer, an segment guide listing containing the text description of each segment is displayed with the text description of the currently playing segment being visually identified on the list. Control means operated by the viewer permit the viewer to choose a different segment to be viewed by selecting the text description of that segment on the displayed index listing.

[0008] In accordance with a further feature of the invention, attribute data is associated with at least selected ones of the stored video program segments and means are provided for selecting and sorting stored segments based on the attribute data.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] FIG. 1 is a schematic block diagram that illustrates the functional components which are used in a preferred embodiment of invention and which operate at both a remote location and at one of the user locations to implement the invention;

[0010] FIG. 2 is a data flow diagram illustrating the manner in which video program content and descriptive metadata is transferred between content and service providers and a personal video recording device operated by a viewer in accordance with the invention; and

[0011] FIGS. 3-5 illustrate screen layout displays illustrating the manner in which program segment guides are displayed to enable the user to interactively control program segment playback as defined by playlist metadata.

DETAILED DESCRIPTION

[0012] Background

[0013] The present invention belongs to a family of related systems that use metadata to control the playback of broadcast programming as disclosed in the previously issued patents and previously filed applications summarized below.

[0014] U.S. Pat. Nos. 5,892,536 and 5,986,692, issued to James D. Logan et al. describe systems which employ metadata to selectively store, manipulate and playback broadcast programming. Some of the novel arrangements and features disclosed in those two patents may be summarized as follows:

[0015] 1. A remote editing station, which may be at the broadcast facility or at a remote location, classifies, describes or otherwise identifies individual segments of broadcast programming and sends metadata (sometimes referred to as "markup data") identifying and describing those segments to a remote client receiver. For example, the markup data may identify individual segments by specifying the source and the time of the original broadcast, or by specifying some other unique characteristic of the broadcast signal. The program segments may be TV, radio, or Internet programs, or portions of programs, including individual songs, advertisements, or scenes.

[0016] 2. The communication link used to transmit the metadata to the client may take one of several forms, including the Internet, a dialup telephone link, the communications pathway used to carry the broadcast signals to the client, or other forms of communication used to transport the metadata to the client.

[0017] 3. At the client receiver, the metadata is used to identify particular program segments that may then be manipulated in one or more of a variety of ways. For example, the metadata may be used to selectively play back or record particular segments desired by the user; to re-sequence the identified segments into a different time order; to "edit-out" undesired portions of identified segments; to splice new information, such as computer text or advertising, into identified segments for rendering with the program materials, or to substitute different material (e.g. dubbing in acceptable audio to replace profanity to make programming more acceptable to minors).

[0018] 4. The client receives and locally stores incoming broadcast programming and uses the markup data to identify desired segments within the stored program materials. The local storage mechanism may advantageously include means for concurrently recording live broadcasting while replaying a delayed version of the previously recorded programming as described in U.S. Reissue Patent 36,801 issued to James D. Logan et al.

[0019] 5. The markup data can provide a detailed "electronic program guide" to the broadcast programming previously received and stored in a personal video recorder (PVR) or an audio storage device, permitting the user to selectively play back a desired segment or portion of the programming previously recorded.

[0020] 6. The markup data may be used to create a recorded collection of desired segments extracted from the buffered broadcast, allowing the desired segments to be saved while the remainder of the buffered materials is discarded to conserve recording space.

[0021] 7. Special markup signals may be selectively sent to individual subscribers based on his or her indicated preferences so that only preferred program segments are identified and processed. For example, a subscriber might request markup data only for sports and news.

[0022] U.S. Pat. No. 6,088,455 issued to James D. Logan et al. describes related systems that use a signal analyzer to extract identification signals from broadcast program segments. These identification signals are then sent as metadata to the client where they are compared with the received broadcast signal to identify desired program segments. For example, a user may specify that she likes Frank Sinatra, in which case she is provided with identification signals extracted from Sinatra's recordings which may be compared with the incoming broadcast programming content to identify the desired Sinatra music, which is then saved for playback when desired.

[0023] U.S. patent application Ser. No. 09/536,696 filed by James D. Logan et al. on Mar. 28, 2000 describes further systems that employ metadata for selectively recording and/or reproducing received broadcast programming. The implementations disclosed in that application employ:

[0024] 1. A receiver connected to record incoming broadcast signals and a PC connected to a web server via the Internet. A browser program running on the PC uses the web interface provided by the web server, selects songs of interest, downloads identification signals (e.g., extracted feature-sets or signatures) which uniquely identify the content of desired program segments (songs), which are then selectively saved for reproduction.

[0025] 2. A signal processor that identifies characteristics of the stored programming (scene changes, voice vs. music, voices of particular people, etc.) that can be used to selectively store desired programming.

[0026] 3. Identification signals derived from received broadcast programming at the client produce identification signals which are sent to a remote server which compares the received identification signals with a database at the server and returns attribute information to the client to describe recognized information. The attribute information can include the title of the segment, the name of the performing artist, albums that have a recording of this segment, etc.

[0027] 4. Program segment files (e.g. songs) in a server library that are made available to those client locations which demonstrate that they are entitled to access the library copy by sending an identification signal to the server that is extracted from a copy of the desired segment already in the client's possession. Thereafter, a qualified client can obtain the authorized copy from the server from remote locations. Locally recorded programming can be uploaded from a client into the library, and such uploading can be "virtual" (that is, need not actually take place) when an equivalent copy of the same program segment is already stored in the server library.

[0028] U.S. Pat. Nos. 5,271,811, 5,732,216, and 6,199,076, and co-pending application Ser. No. 09/782,546 filed on Feb. 13, 2001, by James D. Logan et al. describe an audio program and message distribution system which incorporates the following features:

[0029] 1. A host system organizes and transmits program segments to client subscriber locations.

[0030] 2. A scheduling file of metadata schedules the content and sequence of a playback session, which may then be modified by the user.

[0031] 3. The content of the scheduled programming is varied in accordance with preferences associated with each subscriber.

[0032] 4. Program segments are associated with descriptive subject matter segments, and the subject matter segments may be used to generate both text and audio cataloging presentations to enable the user to more easily identify and select desirable programming.

[0033] 5. A playback unit at the subscriber location reproduces the program segments received from the host and includes mechanisms for interactively navigating among the program segments.

[0034] 6. A usage log is compiled to record the subscriber's use of the available program materials, to return data to the host for billing, to adaptively modify the subscriber's preferences based on actual usage, and to send subscriber-generated comments and requests to the host for processing.

[0035] 7. Voice input and control mechanisms included in the player allow the user to perform hands-free navigation of the program materials and to dictate comments and messages, which are returned to the host for retransmission to other subscribers.

[0036] 8. The program segments sent to each subscriber may include advertising materials, which the user can selectively play to obtain credits against the subscriber fee.

[0037] 9. Parallel audio and text transcript files for at least selected programming enable subject matter searching and synchronization of the audio and text files.

[0038] 10. Speech synthesis may be used to convert transcript files into audio format.

[0039] 11. Image files may also be transmitted from the server for synchronized playback with the audio programming. 12. A text transcript including embedded markup flags may be used to provide a programmed multimedia presentation including spoken audio text created by speech synthesis synchronized with presentation of images identified by the markup tags.

[0040] U.S. Utility application Ser. No. 10/060,001 filed on Jan. 29, 2002
entitled "Audio and Video Program Recording, Editing and Playback Systems Using Metadata" describes means at the user's location for creating metadata which may be used in combination with metadata provided by an external source, for editing metadata in various ways at the user's location, for automatically responding to user activity to generate new metadata which characterizes the user's preferences and which serves to automatically identify and describe (or rate) programming segments, and for responding in novel ways to the available metadata to enhance the utility and enjoyment of available broadcast materials. Methods and apparatus are employed for selectively controlling the presentation of broadcast programming in which a user viewing or listening to broadcast programming at a first location may take advantage of the insights provided by a different viewer at another location in order to control the manner in which segments of the broadcast programming are recorded and/or replayed.

[0041] The disclosure of each of the foregoing patents and applications is incorporated herein by reference.

[0042] Architectural Overview

[0043] The methods and apparatus contemplated by the present invention facilitate the selective storage, organization and reproduction (playback) of broadcast programming through the use of metadata that identifies and describes segments of that broadcast programming. This metadata can be created locally or at a remote site and transmitted to the user's location to enable the user to more effectively manage broadcast programming received at the user's location.

[0044] FIG. 1 illustrates in schematic form the manner in which information is processed in accordance with the invention. As will be described in more detail below, many of the structures and functions illustrated in FIG. 1 represent alternative or optional functionality that need not be present in all implementations of the invention.

[0045] At the remote location, broadcast programming from a source 100 is received at 101 and may be processed immediately or saved in a storage unit 103 for later processing. The incoming broadcast programming signals may be received as a live public broadcast, or may take the form of programming content received prior to the time of its later public broadcast. At 105, the incoming broadcast signals are parsed or subdivided into logically separate segments, which need not be contiguous and which may be overlapping or nested. The individual segments may be processed immediately after they are identified during the parsing process, or they may be stored for future processing in a storage unit 107.

[0046] As illustrated at 111, metadata is then created which describes each of the identified programming segments. The metadata describing each segment may take the form of a separate data entity, or may be stored or transmitted with the content of programming segment, which it describes. Unless the metadata is associated with a particular segment by being stored or transmitted with that segment, it includes a pointer or some other mechanism for specifying the segment or segments it describes. In addition, the metadata typically includes additional descriptive information about the associated segment(s). The metadata created at 111
may be immediately processed or transmitted to the user after it is created, or may be stored for later processing or transmission in a storage unit illustrated at 113.

[0047] Only selected items of metadata may be transmitted to the user location. The specific metadata transmitted may be selected as shown at 115 in a variety of ways. Data describing the demographics of individual users and data specifying user preferences stored at 117 may be used to selectively provide the user with only that portion of the available metadata which is best suited to the needs of the user or which a third party, such as an advertiser, desires to make available to the user.

[0048] Note that metadata created by the user, or preference data supplied by the user or derived from an analysis of the user's use of the system, or from the viewer's demographic characteristics, may be combined with or used instead of metadata and preference data created at the remote location.

[0049] Note also that the content of broadcast programming received at the remote site may be forwarded to the user location with or separately from the corresponding metadata. This content information may take the form of the broadcast programming received at the remote site at 101, previously received programming stored at 103, and individual segments as parsed at 106 and stored at 107. As noted above, the metadata associated with these programming signals may be combined with the programming content as transmitted to the user, or may be sent separately over the same or a different communications pathway.

[0050] The communication methods or apparatus used to transport metadata and/or content to the user as illustrated at 130 may take many different forms, including: the Internet, a dialup telephone connection through the public switched telephone network (PSTN), a wireless transmission system, cable, private line facilities, or data storage media transported from the content publisher and/or the metadata creator to the user. The communications may take place over a combination of such facilities and, as noted earlier, the content and metadata may be transmitted in one or both directions together or separately over the same or different facilities.

[0051] Metadata created at the remote location and transmitted via the communications facility 130 may be stored at 133 at the user location. The metadata stored at 133 may be edited at the user location as indicated at 135, and metadata from the user location may be returned via the communications facility 130 to the remote location for shared use by others.

[0052] At the user location, broadcast programming signals are received at 141, either in the form of a live public broadcast from the source 100, or as programming content received from the remote location via the communications link 130. It a leading purpose of the present invention to provide the user with a better and more convenient way to identify and reproduce that portion of the large quantity of programming that is broadcast for general consumption from many sources via many pathways, including conventional radio and television broadcasting, whether over the airwaves or via a cable or satellite facility. The metadata that is provided from the remote location via the communications pathway(s) 130
may be used to selectively store, organize and/or selectively reproduce programming received directly at the user location from a source 100, or received together or separately with the metadata via the pathway 130.

[0053] The broadcast programming content received at the user location at 141 may be immediately processed or stored for later processing and viewing. As described in U.S. Reissue Patent 36,801 issued to James D. Logan et al. by the invention, the incoming broadcast programming may be concurrently viewed or otherwise processed while it is being recorded in a circular buffer for possible future use. A reserved portion of the storage unit seen at 143 may implement the circular buffer. This allows the user to utilize VCR-type controls to pause and selectively replay or process previously broadcast programming at different forward and reverse playback rates. With the pause capability, the system is constantly recording the last 5 minutes or so of a live radio broadcast, or the last 30 minutes or so of a live television broadcast. When the user hears or views a song or program that he or she likes, the user presses a "Catch" button, and the program will set aside the all of a predetermined part of the stored programming in the circular buffer, as well as a further predetermined part of the incoming broadcast that continues the saved portion, and retains both in temporary storage at 103. Later metadata may then be applied to that segment identifying the beginning and end of the program or song being played at the time the catch button was activated. If the button was hit after a program or song was over, but before or after another began, the system would assume the user was trying to capture the last played song.

[0054] Unless received in already parsed form from the remote location, the incoming broadcasts are parsed at 145 into segments that correspond to the segments created at the remote location at 105. As noted earlier, the available metadata may be used to subdivide the incoming broadcast signals into segments. For example, the metadata may identify incoming segments by source and by start and end times. Alternatively, the metadata may include "fingerprint" or "signature" signal pattern that can be compared with incoming broadcast signals to identify particular segments, and may further include timing information, which specifies the beginning and ending of each segment relative to the location of the unique signature.

[0055] After individual segments have been identified in the incoming broadcast stream at 145, they may be immediately processed or stored for future use in the storage unit 145. Not all of the segments that are identified may be of further use; accordingly, the available metadata may be used to select or discard particular segments as indicated at 151, and to process only the remaining segments, or selectively store them for future processing or playback at 153.

[0056] At 161, the selected segments may be modified or reorganized in a variety of ways in accordance with the metadata. For example, the sequence in which program segments are presented for playback may be modified, and programming materials not necessarily included in with the originally broadcast materials may be "spliced" into the presentation, or all or part of selected segments may be deleted from the presentation. The resulting program content which is in condition for playback may be immediately presented to the user, or it may be stored at 163 for selective playback at a more convenient time as indicated at 171 and 190.

[0057] As illustrated in FIG. 1 at 180 and 135, the user may create descriptive metadata and may edit metadata previously received or created in a variety of ways to personalize the storage, reorganization and playback of available broadcast programming.

[0058] It should be observed that the process of creating and editing metadata may be based on any one of the various versions of the received content; that is, the content as received and stored at 143, as parsed and stored at 147, as reduced to specific segments remaining after the selection and discarding process at 151, and as modified at 161 and stored for viewing at 163.

[0059] It is also important to note that the parsing, selection and modification processes may be performed at different times using, in each case, the most recently stored version of the programming content and the metadata that is available at that time. For example, metadata that is used to parse incoming segments at 145 may be made available from the parser 105 at the remote facility at an earlier time than descriptive metadata arrives from the remote creation process 111. The presence of the storage unit 143 allows received broadcasting signals to be held until parsing metadata arrives which will subdivide the received programming into logical units that can then be still later selected and modified with the aid of descriptive metadata that arrives only after it is created by the remote editing process. Note also that the metadata which arrives first to subdivide the programming stream into logical segments, as well as available metadata which describes those segments, facilitates the task at the user location of generating still further supplemental metadata which describes, rates, annotates or recommends programming content for other users.

[0060] In the description that follows, many of the features and functions summarized above and illustrated in FIG. 1 will be presented in more detail.

[0061] Program Source 100

[0062] The present invention contemplates the creation and use of metadata for describing and manipulating programming content of the type typically broadcast for public consumption by radio and television broadcast stations; disseminated by cable and satellite systems and, more recently, via the Internet; or published for general consumption on data storage media, such as DVD disks. This broadcast programming may be in analog or digital form and, in some instances, may be obtained from a content provider prior to being broadcast. It is important to observe that the "broadcast programming" from the source 100 is available for processing at both a remote station and at the user's location as illustrated in FIG. 1.

[0063] The principle illustrative embodiment as described below is used to select, organize, disseminate, store and reproduce television broadcast programming. It should be understood however that the principles of the invention are, with few exceptions, equally applicable to radio broadcast programming, to programming that is published via the Internet, and to programming such as movies which are transported to the user on published data storage media, such as DVD disks.

[0064] Storage Unit 103

[0065] While the parsing of programming content into segments and the association of descriptive metadata with those segments may be automated to some extent as described later, it is frequently desirable to provide one or more human-operated editing workstations which can used to adjust or "fine tune" the time position of markers which delimit the beginning and end positions of segments, and to manually compose descriptive metadata, provide qualitative rankings, and to otherwise classify or describe the content of each segment. The use of storage units 103 and/or 107 permits unparsed and parsed programming to be temporarily stored so that it may be processed through one or more multi-channel editing stations where a single operator can effectively insure the accuracy of the parsing process and the addition of descriptive metadata to plurality of concurrently received broadcast programming channels. These multichannel editing stations are used at 106 and 111 to subdivide program content into segments described by metadata. The multichannel editing stations employ variable speed playback techniques to control the placement of time markers, and may display close caption text in an editing window to assist the human editor in composing descriptive metadata and classifying the content. These multichannel editing stations scan the programming content in storage units 103 and/or 105 and place the resulting metadata in the store 113 for distribution to users as discussed later.

[0066] Parsing Broadcast Programming at 105

[0067] Automated means may be used to subdivide programming into segments. For example, segment delimiters may be created in response to the detection of scene changes (frequently indicated by blank frames in TV content) or by abrupt changes in overall image content when backgrounds change. In addition, voice recognition processing may be used to detect and automatically map the times when particular individuals are speaking. Predetermined image content may be detected to identify repeatedly used screen displays at the beginning and end of programs, and at program breaks or intermissions. In the same way, audio recognition may be used to identify standard theme songs and announcements used at the beginning of certain program segments. When such standard elements also serve as program segment identifiers, they may be associated with standard descriptive metadata that is automatically accessed from a library and added to form all or part of the descriptive metadata that is associated with the identified segment.

[0068] Frequently, when parsing television programming, the audio and video components have different time boundaries. For example, if it were desired to subdivide a football game telecast on a play-by-play basis, the audio description may well begin before and extend well after the actual play as seen in the video component. If the programming was segmented based on the video alone, the audio would be segmented in a somewhat non-optimal fashion while, in the same way, isolating a video segment alone might cut speakers off in mid-sentence. This follows from the fact that commentary is not frequently not timed to occur between the beginning and end of the activities shown on the video portion of the program.

[0069] Thus, it would be advantageous at times to split the audio at a different point from the video. This strategy might result, however, in interrupting a commentary underway when the next visual segment began. As long as the audio structure does not match the video structure, the human editor should be provided with the ability to independently select different beginning and end points for the video and audio segments, and then be provided with mechanisms for shortening the longer of the two, or lengthening the shorter of the two. The audio content may be lengthened simply by adding one or more periods of silence to the audio stream, and may be shortened by deleting silent periods to compress the presentation. The video presentation can be lengthened by adding "filler content", by adding freeze-frame displays, or by reproducing content in slow motion.

[0070] Another strategy is to optimize the "smoothness" of the audio splits at the expense of the video splits. Thus, instead of splitting the video at the moment the ball is hiked, it might be split at a logical break point for the audio at some point before the ball is hiked. This might make the video a bit choppier, but audio smoother. Note that the image presentation can be more easily lengthened or shortened by slowing or speeding the display rate or by duplicating or deleting frames, whereas the human ear would easily detects the change in pitch when an attempt is made to alter the presentation rate of an audio signal.

[0071] Other method may be used to separate out the songs from the audio stream are by use signal analysis to distinguish music from talkover or to distinguish one song from another.

[0072] Another method used to determine a markup point which will eliminate song talkover is to estimate the likely spot of the end of talkover by employing a database that specifies how far into a song you must go before finding the lyrics or main theme of each song, or the point at the end of the song when the lyrics or main theme ends. This "start and end of music" point, would be used as a best guess as to when the DJ talkover stopped (or if at the end of the song, when the talkover was likely to start again). The DJs themselves often have this information and often use it as a guide that allows them to talk right up until the point that the music starts. For stations that continually employ talkover, putting markers at these predetermined start and end points would provide assurance that no talkover was played with the song.

[0073] It should be noted that segments are not necessarily the contiguous results of subdividing the original programming signal. Segments can be unique, or can overlap or be nested within other segments. Moreover, a segment is not necessarily a subpart of an individual program as broadcast. A segment may be a combined collection or sequence of such programs, may correspond precisely to a single program as broadcast, or may be only one part of a longer program. Importantly, segments may be organized into groups of other segments or programs, and can form a hierarchy of sections, chapters, sub-chapters, etc. Thus, a single metadata entity may be associated in a variety of ways with a plurality of segments, while other metadata entities may be associated with only one segment.

[0074] Storing Parsed Segments

[0075] The programming content that is stored in discrete segments at 107
need not be a direct reproduction of the incoming program signal. Redundancies and overlapping content may be advantageously removed. For example, when audio content is stored, the periods of silence may be removed for more compact storage. The content signal may also be compressed if desired using, for example, MPEG compression for video and MP3 compression for radio broadcast programming. A linear programming process may be used to allocate segments for the scarce viewing time allocated by the user.

[0076] After the continuous broadcast data has be assembled into individual segments, it is frequently preferable to store those segments so that the descriptive metadata which describes each segment can be created, automatically or by a human editor, or both, and be associated with the program content at a pace which is independent of rate at which the segments were originally broadcast. The nature of the descriptive metadata as thus created is described next.

[0077] Creating Metadata at 111

[0078] First, it should be noted that if the metadata is not positionally associated with the segments it describes by being imbedded with, or transmitted at the same time as, the content data, some of the metadata performs the function of identifying the associated program content.

[0079] Stored segments may be identified by a file name, a URL, or by some other unique access key (such as the primary key value in a relational database storage system). When segments can be identified and accessed when needed using such an access key, simply including that key value with the descriptive metadata suffices.

[0080] However, when metadata created at the remote location must be associated with program content received at the user location, a different mechanism is needed. As one approach, the program segment may be specified by the combination of an identifier which specifies a broadcast program source (e.g. a particular broadcasting station or cable channel) together with the start and ending times at which the particular programming segment was broadcast. These "time stamp" values are sent with the metadata to the user location and matched against time stamp information associated with the broadcast programming when received at the user station. For example, a TV program segment may be identified by data indicating the segment was broadcast by WGN beginning at 11:23:42 to 11:32:16 GMT on Oct. 12, 2000.

[0081] At times, predetermined time shifts occur when programs are distributed over cable facilities and the like. When that occurs, predetermined time offsets can be added to or subtracted from the values specified in the metadata, either before or after the metadata is transmitted to the user location. The magnitude of these standard offsets may be determined by detecting the time when predetermined signal patterns are received at the user location, comparing that time with the time when that signal pattern was broadcast as measured at the remote station to generate the offset value to be applied to all segments experiencing the same time shift as the predetermined signal pattern.

[0082] The technique of detecting predetermined signal patterns may be used to establish not only the timing but also the identity of a segment of a sequence of segments. For example, one or more a unique "signatures" may be extracted or derived from a sequence of programming segments from a particular source. The metadata for individual segments may then include values that specify a time offset from the signature marker and, in that way, uniquely identify the segment.

[0083] The technique of identifying segments by means of "signatures" may be used when the stream from which the metadata was derived differs from the stream recorded by the consumer. For instance, if a local broadcast changed the timing of a broadcast program in order to introduce advertising of different lengths, or to add locally focused content not included with the version from which the metadata is made, problems would arise. As another example, the metadata might describe segments within a pay-per-view movie that might be received at different times by different users. In this case, "signature" or "content-based time stamps" may be used to associate metadata with the stored content under these circumstances.

[0084] When the metadata is created, a "signal pattern," or "fingerprint" extracted or derived from the content is used to identify a known time position in the "parent" copy of the version from which the metadata is created. This fingerprint or pattern may also uniquely identify the parent copy, distinguishing it from other content. This fingerprint exists at a measurable time offset from an "index point" in the parent copy used to associate metadata with the content. For instance, if the metadata were marking the beginning of an advertising segment, the fingerprint should be within and near the start of that advertising segment. Alternatively, the fingerprint to be detected to establish the time mark may be within only the first of a sequence of segments, with the first and remaining segments having start and end times expressed by offsets from the single time mark.

[0085] Metadata used to subdivide programming content may take a variety of forms. It may specify the position of markers, which delimit individual segments within a programming sequence by, for example, specifying byte offsets in a file of digital programming data, or by specifying the time position relative to some reference time when segments begin or end.

[0086] Alternatively, metadata may specify identifiable signal characteristics or "signatures" within a programming signal stream. These signatures may be detected to establish the time or data position of markers that may then be used as a base reference for data or time offsets which delimit the programming into segments. Such identifiable signal characteristics may occur naturally within the programming (such as scene changes in a video signal indicated by blank frames, the appearance of a new voice or other detectable signal pattern, or periods of silence) or may be created by ancillary signals inserted into the program stream or in a parallel transmission to serve as markers. Such ancillary signaling may take the form of as identification tones, framing signals, digitally expressed data, and the like.

[0087] Using pattern-matching techniques, each piece of content stored at 103 or 143 may be compared to a specific fingerprint signature. When a match is found, segments occurring before or after the matching pattern may be identified at both the remote site where metadata is created and at the user location where metadata from the remote site is associated with the corresponding received broadcast segment. Multiple fingerprints may be used in order to continually synchronize the two versions.

[0088] The viewing habits of users as revealed by usage logs may be analyzed to subdivide programming into logical segments. With a large enough base of users, a profile of viewing could be constructed for a given program which would tend to indicate when users skipped particular segments, or used the mute control to silence a particular segments, and to further identify segments which held viewer's attention. This type of observed behavior could be combined with other techniques, such as blank frame, scene and voice change detection, and analysis of the closed caption text to further automatically determine the boundaries between logical segments.

[0089] The segment boundaries chosen by automated techniques may be refined by a human editor who makes adjustments to the timing of the automatically selected boundaries as needed. Thus using automated techniques, it is possible to subdivide broadcast programming into logical segments and to provide a figure of merit rating which can be sent to those who view the same programming on a delayed basis to assist them in making program viewing and recording selections, and, if desired, to automate those selections in whole or in part.

[0090] Storing Metadata at 113 and 133

[0091] As noted earlier, metadata describing the segments identified during the parsing process at 105 may be created at 111 in a variety of ways and stored at 113 for potential distribution to users. In addition, metadata created by users may be received via the communications facility 130 to supplement or replace the metadata created at 111.

[0092] Metadata created by users may be shared directly between users. When shareable metadata exists at a user location, it may be "registered" by supplying its resource address (such as an Internet URL) to the remote location which then relays the URL to other users who directly access the descriptive metadata from the other user's metadata storage 133 in a peer-to-peer transfer. In this form, the remote facility shown in FIG. 1
operates as a registry or directory that permits users to share descriptive metadata about broadcast programming with one another on a community basis.

[0093] The remote facility may subdivide available broadcast programming into segments as previously described and then associate each segment with references or pointers to metadata created by users and hosted on user's computers or on an available storage resource (including, for example, storage space made available at 113 for storing metadata).

[0094] As an alternative, the metadata provided by users may include segment identification information. For example, a user may identify a segment of programming by marking its beginning and end, and then create metadata, which describes, rates or classifies that segment. Programming at the user location creates identification metadata for the segment using any of the techniques discussed earlier; for example, by extracting and transmitting a unique fingerprint from the identified programming and transmitting this fingerprint together with start and end offsets, or by identifying the programming source together with the time stamp information specifying the times at which the beginning and end of the segment were originally broadcast.

[0095] The user may review metadata supplied by other users and presented as a program guide to the available stored programming. Before the descriptive metadata from other users is displayed, the segment identification portion of the received metadata may be compared with the programming content stored at 143, 147, 153 or 163 (or with metadata stored at 133 which identifies the content available to the user). In this way, only that descriptive metadata from other users which describes available programming need be reviewed.

[0096] Alternatively, a viewer may transmit a request to the remote facility for additional information about a particular program (which may include multiple segments), or the preferences of the user as stored in 117 may be expressly stated by the user or derived from the user's viewing history. These requests and/or preferences stored at 117 may then be used at 115 to select desired metadata (including references to metadata stored elsewhere) in the store 113 for transmission to the requesting user.

[0097] Thus, the metadata which is created by created by and shared among users one or a combination of the following forms:

[0098] 1. Qualitative (rankings, reviews, etc.);

[0099] 2. Descriptive (summary, topics, etc.);

[0100] 3. Segment identifications (start time, elapsed time, ending time, source, detectable characteristic, ancillary codes); and

[0101] 4. Cross-references or pointers to metadata stored at addressable resource locations, including metadata created and hosted by other users.

[0102] Metadata that includes the URL of a World Wide Web resource provides a robust mechanism for associating the content of particular segments of broadcast programming to both additional information and related interactive transactions. For example, metadata may be associated with programming that permits viewers to learn more about or to purchase products or services related to the programming content. As described above, individual users may also create addressable resources, such as Web pages, and associate links to those resources with viewed programming segments. For example, a fan club for a particular actor might create a Web site devoted to that actor, and then share metadata containing the URL to that Web site with other viewers.

[0103] The user's ability to create and share metadata that describes, classifies or relates to selected broadcast programming segments thus enables users to create a community surrounding those segments in which a rich variety of information exchanges and transactions can occur. Users can, in effect, use the subject matter of broadcast programming as public bulletin board upon which to post comments about the program, ratings and descriptive data which can be used as a basis for indexing and retrieving program content, and for linking in related information from other sources, or for conducting a marketplace by posting offers to sell and to buy goods or services relating to or suggested by program content.

[0104] A "Community Markup" system (here called "CM") may be implemented that serves two purposes: it may be used as a way to develop markup data for sources of program information that may have an insufficiently large audience to justify the creation of markup by a commercial enterprise, or to improve the quantity or quality of markup data offered by commercial sources.

[0105] To optimize the benefit of the community markup, program guide data may be made available to potential users to identify what stations to record. As users can't go back after a broadcast and record it, this method would insure the maximum number of recorded copies will be available both for markup and playback with any CM effort.

[0106] CM can also be used to improve previously produced markup information. For example, if the markup does not accurately reflect the extent to which an announcer may have "talked over" a song, users will have editing tools available to them to alter the placement of the song delimiters and excise the talkover. The CM system will allow users to join a community whereby they will be able to upload their improved markups to a central server at 113 so that other users may access them. A "barter" system may be employed so that, when a user creates original markup data, he or she would then be entitled to receive markup data from other users, potentially avoiding the free-rider problem.

[0107] Improved markups may be downloaded and used to improve previously recorded songs or other content stored at 143, 147 or 153 in an automatic mode. Thus, even if several days elapse before the improved markup is available, the existing recording library would be automatically upgraded. This upgrading of the library would be performed transparently to the user.

[0108] As the originally recorded material is still in local storage, and only the metadata defining the playback markers is altered as a result of the new metadata, the recipient of community improved markup could always "undo" the automatic marker movement and restore the original recording and associated splits.

[0109] In cases where the system receives multiple markups, they can be averaged together for greater accuracy with outliers being deleted. This averaging function can be performed either at the server based on metadata received and stored at 113, or at the user location based on metadata received and stored at 133.

[0110] Note that community created markup would not necessarily have to be stored on a central server. Markup data can be stored solely on user machines and shared via peer-to-peer transfers (e.g. using an architecture of the type employed by Gnutella). In this environment, users would employ shared directory, which would identify metadata about a recording they had made, which exists in storage at 133 at another user location.

[0111] Community Markup (CM) may be created as a byproduct of the user's use of locally generated metadata for creating a personalized program library. For example, the user may record a lengthy radio broadcast from a favorite station and then selects particular songs for inclusion in a personal library, either by using markup signals provided by an remote markup source, or by using the available editing tools at 135, the songs which are identified may contain DJ talkover at the beginning or end of the song. In that case, the user may be employ a one-step-editing feature that permits the user to listen to a song and, when a transition occurs from talk to music, or visa versa, they can simply click on a "scissors" button which moves the start-point or end point of play, for that song, so that, the next time it's played, the new start and/or end point takes effect. Importantly, the talkover is not erased and the play marker is merely moved. If the user did not time very well the use of the scissors, he can hit an "undo" button and redo the clipping process.

[0112] At any time, the user may elect to share the locally stored markup signals with others by transferring that markup to the server for storage at 113 where it is combined with the markup produced by others for that station and time frame, or by transferring the markup to another user with a peer-to-peer transfer. In this way, not only is the markup shared which accurately identifies desirable programming, the markup also operates as a recommendation for that section and, when aggregated among many users, offers the ability to identify and share the "best of today's programming" on a particular station.

[0113] A mechanism related to the "scissors" function described above would enable a given program segment (e.g. song) to be "split out" from the original program recording. Because the available metadata may not accurately identify the precise beginning and end point of a given song, a predetermined duration of programming is included at both the beginning and end of each song as identified by the preliminary metadata. This extra time provides "running room" to make sure that every program has at least the entire rendition in it. Since this extra length could include material from the program segment behind or ahead of the segment being edited, the interstitial material in the nebulous space between songs is duplicated and added to both songs as defined by the metadata. The user may then user the editing means, including the "scissors" function noted above, to provide a final adjustment to the start and end time. When program segments are permanently stored in a selection library, the added material excluded by the final edit may nonetheless be retained at both the beginning and ending to preserve the ability to adjust the start and end points even after the selected program segment is persistently stored in the library.

[0114] Programming may be described, classified and rated using metadata formats. Standard rating systems have been widely promulgated using the World Wide Web Consortium (W3C) Platform for Internet Content Selection (PICSJ). The PICS specification enables labels (metadata) to be associated with content and was originally designed to help parents and teachers control what children access on the Internet, but also facilitates other uses for labels, including code signing and privacy. PICS labels, and other metadata, may be advantageously expressed using the W3C's Resource Description Framework (RDF) which integrates a variety of web-based metadata activities including sitemaps, content ratings, stream channel definitions, search engine data collection (web crawling), digital library collections, and distributed authoring, using XML as an interchange syntax. Details tutorial information and formal specifications for PICS and RDF are available on the World Wide Web at http://www.w3.org/pics/ and http://www.w3.org/RDF/respectively.

[0115] Storing User Data at 117

[0116] Whether the metadata which relates to programming segments is created at the remote source or at one or more user locations, it is frequently desirable to organize or filter the metadata so that they user can more easily obtain the benefit of that metadata which best fits the needs or desires of the individual user.

[0117] One mechanism for limiting the amount of metadata actually presented to the user is to simply store all received metadata at 133 and then to employ means for sorting and/or indexing the stored metadata so that desired metadata can be located in response to the user's specifications. As an alternative, the user's specifications may be uploaded via the communications facility 130 and stored at 117 at the remote facility. The user's specifications or preferences as stored at 117 are then used at 115 to select only that metadata which best fits the user's needs for transmission to the user's metadata storage at 133.

[0118] The user's preferences may be derived from his or her activity. For example, the particular programs a user chooses to save or view may be monitored to determine the user apparent content preferences. Preference data may be produced at the user's location and stored with other metadata in the store 133, from which it may be used locally or sent to the remote location for use there. Alternatively, "user log" data recording the user's activity may be transmitted to the remote location where it is analyzed to produce preference data.

[0119] Metadata which is derived from an analysis of the recorded viewing or editing choices made by other viewers, which may be termed "implicit metadata," includes values such as: the number of users with whom the viewer had common tastes who watched a particular program, or metadata based on analyzing such events as (a) who surfed out of, or did not complete watching, a certain show, or never recorded it in the first place; (b) who took a certain amount of time to watch the recording (if it's a preferred program to a viewer, it will be viewed sooner; or (c) what percent of the program, on average, was skipped.

[0120] Once the preference data are determined, they may used in a variety of ways:

[0121] a). Preference data may be used at 151 to select or discard particular received broadcast segments so that only those which are more likely to be of interest to the user are saved, thus conserving storage space;

[0122] b). Preference data may be used at 161 to modify program content by, for example, inserting, interleaving or substituting advertising or other materials with received program materials based on the users interests;

[0123] c). Preference data may be used at 171 to assist the user in determining which received segments to play, either by automatically presenting those segments most likely to be of interest to the user, or by presenting a program guide containing or highlighting segments of interest from which the user makes the final program selection;

[0124] d). Preference data may be used to help the user select program segments which are made the subject of additional, user-created metadata which is then used locally (e.g. bookmarks or notes for the user's own use) or uploaded to the remote location or shared with other users as noted above;

[0125] e). Preference data may be used at the remote location where it is stored at 117 and used at 115 to select metadata for transmission to the user;

[0126] f). Preference data for individual users or combined preference data from many users may be used at 103, 107 and 113 to determine which programming content and descriptive metadata should be stored, and when previously stored content and metadata should be discarded, to make the most efficient use of limited storage space in light of user demand; and

[0127] g). Preference data may be collected based on the usage of, or ratings supplied for, the metadata itself. In this way, users may rate the perceived value of metadata created automatically or by the editors at the remote facility (at 111) and this rating data may then be used at 115 to select not only programming of particular interest but also to select the metadata deemed to be of the most value.

[0128] Note that the metadata created at 111 and/or 180, and stored at 113
and/or 133, may include metadata used to display an electronic program guide (EPG) for the user which displays in some convenient format information concerning the content of available broadcast programming. Such displayed metadata associates items of descriptive information with one or more program segments. It is thus frequently advantageous to provide the user with means for associating user-created comments, notes, reviews, ratings, and the like by using the EPG display to identify and select the program segments with which the newly created metadata is associated. Metadata created in this way is thus readily shared with other users who share comparable EPG metadata by the simple mechanism of permitting a user to request additional information about a displayed program guide item.

[0129] As noted earlier, metadata created by individual users may be simply stored locally at 133 as an Internet accessible resource. Web crawling "spider" programs executing on remote computers may then retrieve and index this metadata and then act as "search engine" directories that may be publicly accessed to locate metadata of interest. For example, a search for "Stardust" might locate metadata describing an audio recording of the song by that name, biographic programming about the composer or performing artists, and the like. Thus, the descriptive metadata created by professional editors and/or users can form the basis for finding and enjoying content that would otherwise be difficult to index because of its non-textual character.

[0130] Metadata can be developed to characterize individual program segments by processing log file data representing choices made by users in selecting and/or abandoning programs, and from program ratings expressly provided by users. When aggregated by retrieving and combining such data from many users, and when further correlated with demographic data about the same users, rating information can be provided which tends to indicate what other viewers having similar backgrounds and similar past preferences preferred among the currently available program materials.

[0131] Ratings data compiled from actual user selections may provide unique information on how specific consumers react to specific songs at specific times. Thus, a recording studio might release a new single, and immediately thereafter determine how many listeners in a certain demographic had deleted, saved, or listened to that song multiple times. Express song rating data provided by users could be used in addition to or instead of implicit rating data to identify specific program segments that were well received or uniformly disliked.

[0132] When programming is broadcast in one geographic area before being broadcast in another, or when programming is repeated, the viewing and listening behavior of users exposed to the earlier broadcast can be used to provide rating information for later users. Thus, the habits of TV viewers on the east coast of the United States could be analyzed in advance of the later rebroadcast of the same programming on the west coast, so that ratings data tending to reflect which of the programs were preferred may be supplied to west coast viewers in advance. In addition, west coast viewers would have the benefit of advance reviews and summaries of programs created during the earlier broadcast. In the same way, any viewer using a personal video recorder (PVR) or other means for accessing program materials on a delayed basis could be aided in the selection of that program which they, as individuals, would be most likely to enjoy by the availability of rating and review metadata from earlier viewers having similar interests.

[0133] Content and Metadata Communications

[0134] The transfer of both content and metadata is illustrated at 130 in FIG. 1. As described here, both the remote location and the user location may receive and process programming signals (content) from a broadcast programming source. In addition, content may be sent from the remote location to the user location, and content may also be sent from the user location to the remote facility or to other users on a peer-to-peer basis. By whatever path is used, the content which is presented to users is made available to a large number of potential users, and the metadata which describes that programming material is created to aid those users (or particular users) to selectively record and view this programming material.

[0135] The metadata may be created at the remote facility and transferred on a selective basis to individual users, or it may be created by users and transferred to the remote facility for redistribution to other viewers, or it may be transmitted directly from user to user on a peer-to-peer basis.

[0136] The metadata may be transmitted with the programming content, or may be transmitted at a later time, or over a different communication pathway. In many program transmission systems, some of the available bandwidth is allocated to metadata, as typified by program guide channels or time slots provided by the vertical blanking interval (VBI) in a television signal. These existing pathways may be used to transfer the metadata contemplated by the present invention which contemplates, in many implementations, the transfer of metadata after the programming material has been broadcast but before the programming material is viewed on a delayed or time-shifted basis after having been recorded earlier.

[0137] Thus, as described here, the metadata may be created by editors or viewers who comment on or rate viewed material at the time of or after its initial broadcast, with the metadata being transferred to end users to facilitate the selection, recording and playback of desired material on a time shifted basis. In summary, the metadata flow need not be transmitted before or concurrently with the original broadcast, but is may be created by early viewers and used by later viewers who watch the programming on a delayed basis, either because the version they watch was broadcast later or because the version they watch was previously recorded for later viewing.

[0138] Creating and Editing Metadata at the User Location

[0139] As previously noted, metadata may be both created (at 180) and edited (at 135) at the user location. The user may programmatically derive this locally created metadata from the viewing choices made without requiring any additional effort by the user, or the locally created metadata may be the result of interactive choices made by the viewer. For example, a viewer may receive metadata from the remote source which takes the form of an electronic program guide describing broadcast programming, and with respect to each item of such programming, the locally generated metadata may indicate whether or not given program segments had been (a) selected for storage for potential later showing, (b) selected for actual viewing, (c) viewed for a specified period before being terminated, (d) saved for later repeat viewing after having been viewed, (e) expressly rated by the viewer, or (f) made the subject of a written text review. This locally generated metadata reflecting the user's use of or assessment of the programming materials, as placed in storage unit 133, is then uploaded to a remote processing site for distribution to other viewers or simply placed in an addressable location from which it may be retrieved for processing by one or more rating services or by other viewers.

[0140] When the user stores broadcast programming in the store 143, the user has no control over the incoming content. To more easily control what is saved for possible future playback, the user may be provided with a "Never Again" button. Whenever the user is listening to or editing a program segment, such a song, or has highlighted that program in a library program listing, the user may press the Never Again button to prevent that song from being recorded or, if recorded, to automatically prevent that song from being presented to the user in a list of available songs. Alternatively, pressing the Never Again button may also permit the user to prohibit the listing of any song by a particular artist, of a particular song by any artist, or further editions of a serialized program.

[0141] Over time, the use of the Never Again button may be used to develop a "negative screen" of preferences for that user and may be used to automatically eliminate or reduce the number of program segments or songs related to a program song excluded by the Never Again button. The Never Again button may also be one of the several ways that users will be able to accumulate preference information that can be used to control playlists transmitted from the server or created locally by the user. Note that, like other metadata, the Never Again list is kept as a separate file and users may undo a Never Again designation at any time so that it will have no further effect on existing or future recorded content.

[0142] Instead of a negative filter, a huntlist, or "positive" filter may be used as well. With a huntlist, a user identifies which songs or which artists he wants the system to capture. In addition, a huntlist may contain "songbots" (algorithms that search for described types of songs that the user wishes to have captured). A typical songbot could be "All Top 40's from the "70's". Other huntlists may be created using collaborative filtering techniques. Huntlists may be compared with metadata developed at a remote server (with the comparison occurring at either the server or the user location) to flag desired songs as they arrive from the broadcast source and are stored at 143, or they may be used a sieve, whereby hunted songs are saved and non-hunted are deleted and never presented to the user. When songs are "found" by a huntlist created at a remote server, an email may be generated telling the user that new songs are now in the jukebox. When the huntlist operates locally, a dialog box or the like may be used to alert the user to the presence of the desired song or program segment. In addition, the user may access his huntlist and see through a visual cueing of some kind (different colors for instance) which songs have been captured and which have not.

[0143] The huntlist may be compared with metadata describing the programming broadcast by a plurality of different stations to identify stations and times when desired programming is most likely to occur, and a program controlled tuner may then be used to automatically capture broadcast content from the identified stations at the identified times. When program segments or songs identified on a huntlist are available for purchase, a "Buy" button or a "Sample" button, which allows a user to hear a sample of a song, may be presented to the user to enable the purchase to be evaluated and executed if desired.

[0144] Automatically Bookmarking Programming Content

[0145] The system contemplated by the present invention may further include a mechanism at 180 for automatically defining and generating bookmarks which may be applied to the content stored at 147, 153, and/or 163 to facilitating navigation of the stored content and/or for personalizing content as performed at 151, 161 and/or 171 to thereby selectively control the playback of programming materials at 190.

[0146] The leading objectives of the automatic bookmarking mechanism contemplated by this aspect of the present invention are to:

[0147] 1. Automatically specify segment start and stop delimiter positions (at 145);

[0148] 2. Automatically categorize the segments;

[0149] 3. Automatically create descriptors for the segments;

[0150] 4. Automatically eliminate redundancies if necessary at 151 and/or 171; and

[0151] 5. Automatically concatenate related pieces of a story at 151, 161
and/or 171 to implement one or more different ways to watch television

[0152] Content that is well suited for these bookmarking techniques consists of segment-able programming like news, sports, or shopping programming, but some techniques apply to other types of programming as well. The automatic bookmarking mechanism may be implemented with a variety of available technologies, including natural language processing, voice recognition, face recognition, sound recognition, and probability theory.

[0153] The bookmarking system can operate on the client side as noted above, or at 111 at the central facility (which can include at the broadcaster's facility) to create bookmarking metadata that may thereafter be downloaded to the client with the program (if the analysis work is done ahead of time) or via a separate channel such as the Internet. The bookmarking metadata may be created ahead of time before broadcast, or more likely, after the broadcast when there is a short window of time to create metadata before the viewer watches time-shifted material.

[0154] Creating Bookmarks From Close-Captioned Text

[0155] The preferred system may make extensive use of the closed-captioned text. The close caption text will be feed into a Natural Language Processing Engine (NLPE) in order to interpret the meaning of the material. When the system determines a change in topic, a marker is set. The system will also attempt to categorize the material and generate a short "slug" describing the material.

[0156] The closed caption material is typically fed into the NLPE system in blocks, as the system can process the material faster than it is broadcast. As a topic break might lie close to a break between blocks, the system processes overlapping blocks as needed to be sure no breaks came between, or close to the endpoint of a block. The close caption text, when fed through the NLPE, may also be used to generate a caption for each individual segment as well as to categorize the segment.

[0157] Closed captioning can be done live or ahead of time. When done ahead of time, it is synchronized quite tightly, within a fraction of a second, with the program content. For live captioning, tight synchronization is not typical, and the delay can be on the order of a few seconds. When loosely synchronized caption exists, the system may automatically attempt to re-synch the captioning with the video after recording. One way to do this would be just to use some measure of average delay for that type of content and adjust the captioning accordingly. A better method employs face recognition or shape recognition to analyze the video content to determine when a person is speaking by focusing on lip movements. The captioning could be re-timed to match up with the end of a speaker's as often as needed. Alternative, voice recognition could also be used when the captioning reflects the spoken sound track. Note that the accuracy of the voice recognition would not have to be very high since, if a definitive match was found every few words, the time delay could be re-adjusted until a subsequent match is found.

[0158] The bookmarking mechanism may use speech recognition in combination with a database of navigational words that commonly indicate that a break or segue is in process. These would include words or phrases such as "coming up next", "next week", "Over to you, Bill", etc. "When we return" would signal the start of an ad. Questions might often indicate a change in direction of the content. When such a phrase was located, a marker would be generated. Alternatively, the closed caption text may be scanned, or using voice recognition software may be used to process recorded speech, to find these words and phrases.

[0159] The manner in which users view a given program may be monitored to position automatically generated bookmarks. The video playback system typically includes a fast-forward mechanism that permits a user to rapidly search through a program until a passage of particular interest starts, at which time the user returns the player to normal viewing speed. Typically, the image can be seen during this movement and sometimes the audio can be heard as well, particularly if it is time-scaled to give the audio pitch control. This fast forwarding activity may be monitored to identify the beginning point of a segment of interest. The system is preferably able to collect and aggregate such bookmark position data on an anonymous basis, perhaps just from a minority of the total users, to identify the points in each piece of content where users frequently resume normal playback speed after fast-forwarding to a desired position. Note that, in general, the important bookmark to get right is the beginning of a segment. The end of a segment normally takes care of itself as people often skip out before getting to the end, or if not, the end of one segment becomes the beginning of another. The point is that few viewers fast forward to end a given passage, but rather fast forward through a segment or sequence of segments until the beginning of the next desired segment is reached. Due to this fact, time scaling is a useful tool for finding segment beginnings. This is because some number of users will scale forward rapidly, still understanding most of what is being said if the audio is able to be heard, or will be able to view a fast motion version of video programming, and will then slow down when the interesting material starts to play. It is this inflection point we are looking for. It will indicate a change in interest level to most viewers and thus could serve as a source of auto generated bookmarks. The preferred system accordingly aggregates time scaling commands from a large number of different viewers to deduce segment beginnings as an average of a concentrated group of these fast-to-normal transition times.

[0160] Viewers will typically overshoot the actual beginning of the next segment as they cannot discern that a new segment has started until they watch or listen to a bit of it. Some percentage of viewers may go back and try to start at the exact beginning, at least for some segments. As a result, the best way to fine-tune the estimate of the location of the segment beginning point would be to estimate the average overshooting error and subtract that distance from the deduced segment based on the average or calculated segment beginning. This average-error-length could be found through empirical study, or deduced, by again, monitoring viewer behavior on the system and watch that small number of viewers who go back and rewind to get to the exact beginning of a segment. In general, the system described by this invention would wish to err on the side of starting the segment too early as opposed to starting within the segment.

[0161] Since large numbers of people fast-forward through ads at high speeds, aggregating the data around these clusters (dropping out users who are obviously fast-forwarding past the ad itself) would give a good indication of where the ad stopped. Since the average user stops a certain number of seconds after the ad ends, this average stop time, minus the average error, could be used to deduce the end of the ad and start of the next segment.

[0162] By the same token, the aggregation of data which distinguishes program segments which are frequently skipped by fast forwarding from those that are viewed normally can be used to identify popular segments. For example, a substantial number of viewers may fast forward through the Tonight Show to find the Top Ten segment. The system can learn to spot these clusters of segments or content through which other viewers have fast-forwarded, label them and pass them on to other users allowing them to skip over these unwanted segments instead of fast-forwarding through them.

[0163] In particular, Hot Spots would be most interesting if the comparative group was matched the profile and preferences of the viewer. This would give it more of a collaborative filtering capability.

[0164] Another form of metadata that could be automatically generated from other viewers' actions is which segments elicited an interaction by other viewers of iTV functions. These might include an interaction with a Wink-like system whereby sport statistics are available over the data channel of the cable operator. For instance, a viewer might wish to focus on segments in a History Channel program where other viewers had accessed background information. Another example involves t-commerce and systems that allow viewers to purchases an item from the TV using the remote. In a home shopping channel, this sort of metadata could serve to guide a user to the hottest items to buy.

[0165] Sound Cues

[0166] For purposes of this section, it is useful to define a new type of content here called "rolling content." Whereas segmentable content includes news and weather and linear content includes shows like "Friends" and movies like "Gladiator," rolling content would include programming such as a soccer or hockey game, which is a hybrid. Programming with rolling content have periods of higher and lower interest, and some climaxes like goals, but the "breaks" are more analog in nature. Many cues indicative of a break in rolling content could be deduced by sophisticated audio recognition. Important sound recognition types would include laughter, applause, referee whistles, and crowd noise. Crowd noise for instance increases dramatically every time a home run is hit, or a shot on goal is taken in a soccer game. The system could understand how long it takes on average to develop a play in soccer that would cause a cheer and drop a marker in several seconds before each instance. In a comedy program like the David Letterman show, the "action" is expected to be continuous, so a marker would be dropped in after each instance of laughter, presuming that is the beginning of the next joke. A software algorithm might detect other types of sound information such as the level of excitement in a speaker's voice, or the quickness of speech. These variations could be transformed into bookmarks. Different algorithms could be developed for each type of sound, and vary by each show. The user could do further modification of the algorithms, for instance, deciding to watch 10 seconds rather than 30 seconds of content leading up to a crowd's roar. Alternatively, the system might "learn" preferences such as this by monitoring the specific user's use of the fast forward button or time scaling feature.

[0167] Recognizing Repeating Patterns

[0168] Multiple copies of the same show may be analyzed to see if patterns repeat themselves. These patterns might be in the video or audio and might signal the beginning of a segment. For instance, the appearance of the weather map might indicate the beginning of the weather report. The system would look for pattern markers that were spaced apart about the length of an expected segment. As stated earlier, the time scaling or fast-forward usage information could be used to confirm that these bookmarks are usable. In addition, if nobody is skipping forward to them, that tends to indicate they might not be correct.

[0169] Music Recognition

[0170] A music discriminator, that is a signal analyzer able to discriminate between music and talk and deduce when music is playing in the background, can be used to provide bookmarks. Music analysis may also be used to distinguish one type of music from another, and perhaps distinguish bands or songs. These algorithms could be useful for detecting breaks in a video show, as well. This technique could be particularly useful for detecting ads as many start with music.

[0171] For rolling content or linear content, detecting the type of music playing might be useful. In many cases, music is used to highlight the "essence" of a movie. In many movies, a characteristic type of music played during each action scene or love scene, for instance. Metadata based on the type and location of this background music could be used to classify areas of content into different moods or types of content such as love scenes, action scenes, etc. A user could use this information to just play back these portions of the content.

[0172] Another form of similar metadata would be the frequency of scene changes. More scene changes would indicate more action. By the same token, the degree of motion in the image itself can be detected from the amount of redundant information that is dropped out in the encoding process. This information could be used to deduce or measure the degree of motion in the scene, information which could be used to deduce the "action level" in the scene, perhaps in conjunction with other indicators.

[0173] Character Changes

[0174] The preferred system would be able to detect the coming and going of characters, announcers, or actors in the programming. This could be done through face recognition technology or through voice recognition (where different peoples' voices are recognized regardless of what they are saying). In news shows, this would be particularly useful when one announcer hands off to another. For other types of programming it might help to automate our "Favorites" Way to Watch (which we typically describe as a way to track Tiger Woods through a golf tournament). Further logic, implemented using data generated from voice or face recognition, may be used to determine who was the anchor and who were the subsidiary reporters. The breakpoints could be focused on the times where the camera went back to the main announcer.

[0175] Visual Cues

[0176] Scene recognition (as opposed to scene change recognition) would be useful in deducing breakpoints. Similar to sound recognition, visual recognition could (either now or in the future) spot when two newscasters were using a split screen, when stock prices were on the screen, when a ball went into the basket, etc. Overall visual metrics, such as the amount of movement on a soccer field, could be indicative of a timeout or frantic action.

[0177] User-Generated Bookmarks

[0178] Another form of non-tagging-station-generated bookmarks would be for users to create their own bookmarks by clicking a button as they watched the programming. This could be related to the Save feature (see below), or merely to enjoy while re-watching the programming. The user could also have the option to categorize and label the segment if both beginning and end points were denoted. If enough users in the monitored sampled bookmarked the same scene, the system could average these locations out to present a definitive mark to other users. In the same way, when a user bookmarks a spot in order to save a segment, this new viewer-generated location data could be used to create the deduced bookmark.

[0179] There are four types of viewer-generated bookmarking information: Fast-forwarding or other analog motions through the video, bookmarking for later repeat viewing or showing to others, bookmarks made to send to a friend, and bookmarks made for purposes of saving. In this list, viewers can be assumed to have the most thought into the actions later in the list. Viewers saving segments would therefore be presumed to have put the most thought into the exact placement of the bookmark. As a result, as data from these multiple sources is compiled and synthesized, extra weight would be put into the latter categories. The exact weightings could be tested through empirical testing. That is, an editor could study the video and determine the "proper" bookmark locations and then develop a model for using these data inputs in the most accurate fashion.

[0180] Note that once "deduced bookmarks" start to be presented to viewers, the system would cease to collect as much new information as viewers' navigation actions would then be based on the data being supplied. Therefore, the system would have to decide at what point enough field data had been collected before dispersing its deduced bookmarks.

[0181] Once deduced bookmarks were distributed to viewers, the system would monitor their usage. If some minimal number of viewers jumped to the given deduced bookmark but then shortly thereafter fast-forwarded a short distance, or re-wound a short distance, this would be interpreted by the system as an attempt to adjust the bookmark. This adjustment distance would then be used to re-adjust the distributed bookmark going to viewers for the first time, as well as used to re-adjust bookmarks that were "in the field", that is already distributed. Again, data coming from viewers known to be "careful adjusters" would be given extra weight.

[0182] Certain viewers might be determined to have better skills in determining accurate skip points. This might be determined by looking at how well their marks clustered around the average location for a given markup point. The markup points offered by these users could then be given extra weight in the overall averaging process.

[0183] The averaging process could take into account multiple inputs-viewers' fast-forward stopping points, viewer-generated bookmarks, viewer-created segments that were saved, viewer adjustments, etc.

[0184] Aggregate User Feedback Used to Edit Breakpoints

[0185] Above we discussed how break points could be deduced by watching user's actions from which we could deduce breakpoints. Another way to use aggregated data is to watch how viewers use our proposed bookmarks. For instance, take the case of generating a break mark when the news announcer changes. This may not be the signal of a new story--it may just be the anchor handing off to a field person for a report. We could deduce that by watching how early users of the metadata don't skip at that break point, but watch the preceding section and go right on through this supposed next segment. If it were truly a break in the content, some percentage of people would be assumed to skip at that point or close to it. Therefore, if a bookmark is not used by some minimal percentage of people (who have watched the entire previous segment) as a launch pad to jump forward from, it would be assumed to not be a meaningful break. If no one uses a bookmark, then by definition it is not useful.

[0186] Correspondingly, if a high enough percentage of viewers skip out of the previous segment and then shortly skip out of this second segment, it could be assumed that the content is too similar and again, it is not a meaningful segment marker. Again, by definition, if there is an extremely high correlation between viewers skipping one section and then the second, the two are probably very closely linked and probably the same segment.

[0187] The exact percentages needed to make these decisions can be empirically tested as stated above.

[0188] But in general, the system can be organic and self-correcting. For instance, field data can always be used to second-guess a decision made by the system. If the system for instance, erases a marker, and then sees monitored viewers starting to fast-forward through material demarcated by the old erased marker, it can re-insert the marker.

[0189] Combining Different Metadata Using Bayesian Statistics

[0190] The NLPE will not always accurately segment the show. As such, it will be useful to combine this technique with others. Each technique will add additional information in determining the probability of a break. For instance, scene change analysis will be used to deduce when a scene occurs. If such a change occurs close to where the close caption analysis suggests there may be a topic change, then Bayesian statistical modeling will be used to predict the probability of a break.

[0191] Time-Based Data

[0192] Further data to add to the Bayesian analysis would include the time duration since the last break. Each program could have stored a frequency distribution of how often a topic change occurred. As the time-since-the-last-break increased towards, and then past, the average length of a segment, it increases the probability that an inference of a topic break is, in actuality, a real break. This time-based data would be added to the data synthesized by the Bayesian tool.

[0193] In other words, it's unlikely that CNN News would have a 10-second story. So the time-length factor would sharply mitigate the probability of the system producing a break after ten seconds even if the closed caption text and scene change analysis suggest such. On the other hand, segments rarely go past 2 minutes. So as the length of a segment approached a long duration, the "benefit of the doubt" would start to swing towards designating a break. Bayesian statistics is the methodology of revising probabilities based on new data.

[0194] Double-Indexing

[0195] Another goal of the system would be to develop two levels of bookmarking--one equivalent to chapters and one equivalent to paragraphs. The methods discussed above could all be adapted to determining minor changes from large ones.

[0196] By the same token, the system could produce "hard" bookmarks (ones it is confident in) and "soft" bookmarks. An interface could be offered whereby viewers could be offered a choice of being very careful in their surfing by jumping through all the soft bookmarks, or a bit more relaxed and only deal with hard ones.

[0197] Training

[0198] The system could be trained to adaptively produce (better) auto-generated bookmarks, by looking, over time, for correlation between known accurate segment markers (generated by hand or other accurate means) and those deduced though the means discussed above. The knowledge gained in this learning process would be used to update the Bayesian probabilities.

[0199] System-Created Text

[0200] NLPEs can identify key words in the text. These could be assembled to form very cryptic slugs that would fit on a TV screen. They might form a sentence if there was room on the display (George Bush in China), or the slug might just be a list of key words (Bush China). The screen display could be set up so that the user could hit the right button once at a particular slug to see a sentence that scrolled off the screen, or could hit a button to access a longer descriptive piece about a story.

[0201] Abbreviations: The system would keep a large library of common abbreviations and use these when needed in the slugs or other descriptive text to save space. This feature could be turned on or off by the user. Locating the cursor on an abbreviated word, and selecting it, would present the whole word. Alternatively, a viewer could go to an index of abbreviations.

[0202] Auto-generated bookmarks could be created on a customized basis for each viewer. Some of this computation could be done on the client or the customization could occur by customizing the presentation of bookmarks created by a central system.

[0203] Preference setting by each viewer would customize the presentation in a number of ways. The user could input levels of "hardness", the density of bookmarks, and the maximum or minimum length of segments desired. The viewer might also be aware of the level of "maturity" of bookmarks--that is the number of previous viewers upon which the deduced bookmarks are based. Have the bookmarks stopped "moving"? The viewer could also input keywords that would signify extra interest.

[0204] Alternatively, the system could deduce these parameters (desired density, for instance) or keywords on a viewer-by-viewer basis. If a user continually skipped out of a segment shortly after landing each time, the system might deduce that the user was not that interested in the content and therefore reduce the density of presented bookmarks.

[0205] If the system deduced a keyword for a user, it could then find the closest bookmark with which to demarcate it. For instance, if a keyword is found, the system might lower its threshold of tolerance for creating a bookmark thus allowing one to appear shortly before the word. In this manner if the user is surfing rapidly through the content, they will be sure to catch that segment close to the relevant point. The keyword could be displayed on the screen as well. These bookmark parameters could be displayed to viewers as they watched as visual icons on the periphery of the screen. In this way, viewers would be reminded of the information with which they could be navigating. It would also teach viewers what the unseen metadata was and encourage its use. Viewers would also be made aware of whether they were navigating with NLPE-derived markups or behavior-based deduced bookmarks.

[0206] User-Controlled Settings

[0207] Errors. An NLPE will never do a perfect job. It will sometimes generate markups that shouldn't be there (false positives) and at other times, miss breaks that do should be there (false negatives). Some users might have a preference for one or the other type of error. As such, viewers could have the option of setting a control that modulates the Bayesian statistical analysis tool such that one type of error or another was favored. (It is a bit of a zero-sum-game--trying to minimize one type of error will increase occurrences of the other.)

[0208] User Selectable Lead Ins. Errors will also be made in finding the exact beginning and ends of segments. If the system tries to hit the spot exactly, it might often cut off some of pertinent material. This could be annoying and make the viewer have to scroll back to find the true beginning. Consequently, and end-marking bookmark may be delayed to add extra material to each identified just to "be on the long side". We envision having the user be able to modulate this type of error trade-off as well, as discussed above for marker error.

[0209] User Selectable Segment Types. The preferred system would let users indicate preferences for certain segment types. For instance, a sports fan might indicate a preference for jump balls (recognized by a whistle and characteristic picture composition), or applause lines on the Letterman show. This preference information is more form-based than content-based, the usual parameter used for personalizing.

[0210] Using Bookmarks

[0211] Save Button. With this button, a user could take a segment associated with a bookmark (e.g., the programming between one bookmark and the next) as stored at 151 and drop it into a "scrapbook" or vault at 153. This scrapbook could have a specific amount of storage space at 153
allocated to it. This storage space might be actual hard drive space on a viewer's PVR or shared storage on a network. In the case of the shared storage, the viewer's "ownership" of the stored content would indicated by metadata that associated that network-stored content with that viewer. Items dropped into the space could be assigned permanent or temporary storage. By default, segments would be sorted by, and assigned a label from, the show from which they came. The user could tag the segment with additional tag via voice input, a keyboard, or by selecting from a menu. (Each show would have menus of clip-types that made sense for that show--pass plays, tackles, runs, etc. for football). The segments in the scrapbook could be sorted by category bucket and or segment type. In addition, a user-operated "Scissors" tool could be used to clip off unwanted content. Playlists could be setup and segments could be sorted by type or by time, etc. Furthermore, the video scrapbook could be implemented by having the users do both cuts to define the segment. In this way, they won't rely on metadata to automatically copy the piece out of the stream, although they could use the metadata to navigate to the desired point at which to cut.

[0212] Scanning Playback. One playback tool that is not necessarily associated with automated markup may be thought of as a "scan mode," similar to that found in radios. In radios, scan will jump from channel to channel, giving you a sample of each. In an equivalent PVR feature, the TV could play short segments of each segment and then go to the next one if the viewer doesn't hit a Play button. This feature would be best applied where the chance of the user wanting to see a particular piece is low and it's too tedious to keep hitting the Next button.

[0213] "Sweet Spot" Surfing. A NLPE would be able to find the heart of a story. Currently this technology is used to summarize a newspaper article for instance. In our application, the NLPE would identify the key segment of a news story or other segment. This would be the "sweet spot" of a segment. Our editors could also demarcate these sweet spots. Alternatively, they could be deduced by watching where our viewers put their systems into slow motion, or replayed the content. Sweet spot surfing could be a way to let users get right to the juicy part of each segment, without necessarily starting at the very beginning of the logical segment. It could be the spot that a viewer jumped to when hitting the Next button or employing the Scan button (see below).

[0214] Sweet spots could also be deduced by analyzing navigation patterns produced by other viewers. These would be the portions of segments never fast-forwarded. In this model of sweet-spot surfing, the system would set the bookmark at the beginning of this section and let viewers land right in the middle of the larger section.

[0215] Segment Filters. The idea here is to treat ads or other repeatable segments of content as recordable scrapbook items. These segments could be fingerprinted and have a duration associated with them. When the PVR spotted an ad or other particular type of repeated segment, it would back up to the beginning of the segment and go to the end and demarcate the segment. These bookmarks could be loaded into the system for viewer use.

[0216] These segments could also be treated through a rules-based system assuming that they are ads. For instance, the rules might say that any ad that the viewer has seen "X" number of times, be deleted, or automatically skipped on playback, etc. In some cases, for instance with an ad-supported modality, the user might have to watch a segment of it before being allowed to continue on.

[0217] Features similar to Never Again and the Huntlist described in the above-identified previously filed applications could be set up. The Never Again feature is a personalized list of segments, which the user does not wish to see again. This list can be stored on the client or on the network. The user could add a segment to the list during viewing by a command or could construct the list during a non-viewing session. The Huntlist is a similar sort of list, personalized by viewer and constructed in a similar manner. In this case, however, an item on the Huntlist would be given special status and highlighted in certain ways to bring it to the attention of the viewer. It could even be automatically saved for that viewer. The user would go to a database and request the system pull down Budweiser ads. Our system would then download fingerprints for those ads to the client PVR to be used to hunt for the relevant clips.

[0218] Other ways to identify ads would be to look for scrolling text, 800
numbers, and abrupt change in the frequency of scene changes. In other cases, the "scene format" may suddenly go away. For instance, the Fox scoreboard in the upper left, or part of the Bloomberg information matrix may go away to make room for the ad. If it were a baseball game, scene recognition techniques could use a database of shots of infields, hitters, etc. to detect that the game was still showing. If it is a news show, there may not be a talking head in the picture. A database of newscaster facial images or voices could be maintained for each show, and if someone not from the list is deemed to be present, this fact could indicate an ad. Similarly, a database of products commonly advertised, may be maintained and used to determine if advertising was being viewed or not. In addition, an algorithm detecting the excitement level in a voice or other tonal quality might indicate it was not a newscaster, or even an interviewee. Ads might also have people speaking quicker. And currently, a lot of ads don't have closed caption text. Any number of these clues may be used in combination with Bayesian techniques to determine the probability of an ad break.

[0219] Note, that some clues posit the location of a break point (switching of speakers, for instance) without knowing if there is a change in subject matter, whereas other clues indicate the presence of an ad (mention of a product name, for instance) without knowing where the break is. By combining both types of information, the content may be both segmented and categorized. This technique of combining content