| IP.com Number | IPCOM000152211D |
|
|
|---|---|---|---|
| Dated | Apr 26, 2007 UTC | ||
| Size | 3 page(s) (37.2 KB) | ||
| Disclosed by |
|
||
| Country | |
|---|---|
| Language | English (United States) |
This document was submitted to IP.com's Prior Art Database and this preview is designed to provide you with information regarding the contents of this document by displaying up to the first four pages of the document as scaled page renderings and displaying a limited amount of text which was extracted from the document on the Text Preview Tab.
To find out more on how to obtain the entire document, click the Download tab. There is a charge for downloading some Prior Art Database documents; please examine carefully whether you believe this document fills your needs before purchasing.
For more information about the Prior Art Database, visit the Learn section of this website. Thank you for visiting IP.com's Prior Art Database! You may wish to check out our Intellectual Property Library website before you leave.
User annotations to facilitate collaborative web crawling and indexing
Inventor - Gautham B Pai
IBM
There have been works on how to obtain user annotations and how this information is stored and presented to the user. This publication describes how these annotations can be used in the crawling and indexing process of a search engine to make the indexed information more accurate and relevant to the users.
The steps involved in this procedure are as follows:
Users annotate the content.
1.
2.
3.
4.
5.
6.
User annotations is not a new concept. There has been mention on how interfaces can be provided to users to create annotations of web resources and how this can be stored and presented back to the users. [1]
Users come across some information either by using a search engine or some other means. The users would want to annotate this information in different ways. Annotation can be as simple as attaching a set of keywords (tags) with this information or providing facts about the information in the form of triples (ex: RDF) or other similar ways. The user interface may be a plug-in in the browser, which interacts with the web-crawler asynchronously or could be a page hosted by the search engine providers, where the required information can be provided. The basic interface requires the user to provide the following information:
The URL pattern of the pages where some information is present.
The XPath regular expression to the specific content in the page.
The annotation information that the user wants to associate with the content that is present at the specified section of the page.
The URL patterns can be supplied as regular expressions to cover a larger range of pages. Each resource that matches this regular expression has a specific section which has information of importance. This section can be identified by an XPath to the information. The XPath expressions may themselves contain regular expressions. Finally the information is annotated using either labels (or tags) or as facts, in the form of triples (the information 'x' at this particular XPath, is talking about 'y') or other ways.
There has been some work on use of user information in the indexing process. For example, Morris et.al. [2] describe the use of an indexing process using information from clients' browsers. This publication however differs in the way that users contribute
1
Search engine crawlers come across pages as part of the crawling process.
Search engines create the index considering the annotation information provided by
the user.
Users perform search based on special tags.
Users rate the content.
Search engines use the ratings to refine the ranking of the results.
directly to the indexing process.
Example
Consider a website offering information about various cities. This site provides information like the time offset from GMT or UTC, the latitude and longitude of this city, the dialing code for the city etc....
Copyright © 2004-2010 IP.com. All Rights Reserved.