Scaling Personalized Web Search

Recent web search techniques augment traditional text matching with a global notion of “importance” based on the linkage structure of the web, such as in Google’s PageRank algorithm. For more refined searches, this global notion of importance can be specialized to create personalized views of importance—for example, importance scores can be biased according to a user-specified set of initiallyinteresting pages. Computing and storing all possible personalized views in advance is impractical, as is computing personalized views at query time, since the computation of each view requires an iterative computation over the web graph. We present new graph-theoretical results, and a new technique based on these results, that encode personalized views as partial vectors. Partial vectors are shared across multiple personalized views, and their computation and storage costs scale well with the number of views. Our approach enables incremental computation, so that the construction of personalized views from partial vectors is practical at query time. We present efficient dynamic programming algorithms for computing partial vectors, an algorithm for constructing personalized views from partial vectors, and experimental results demonstrating the effectiveness and scalability of our techniques.

General web search is performed predominantly through text queries to search engines. Because of the enormous size of the web, text alone is usually not selective enough to limit the number of query results to a manageable size. The PageRank algorithm, among others, has been proposed (and implemented in Google) to exploit the linkage structure of the web to compute global “importance” scores that can be used to influence the ranking of search results. To encompass different notions of importance for different users and queries, the basic PageRank algorithm can be modified to create “personalized views” of the web, redefining importance according to user preference. For example, a user may wish to specify his bookmarks as a set of preferred pages, so that any query results that are important with respect to his bookmarked pages would be ranked higher. While experimentation with the use of personalized PageRank has shown its utility and promise, the size of the web makes its practical realization extremely difficult. To see why, let us review the intuition behind the PageRank algorithm and its extension for personalization.


[Download the full research..]


Glen Jeh and Jennifer Widom – Stanford University