Ticket #41 (closed enhancement: fixed)

Opened 2 years ago

Last modified 1 year ago

Caching Documents at Time of Indexing

Reported by: John Hampton <paocpablo@pacopablo.com> Assigned to: athomas
Priority: major Component: pyndexter
Severity: normal Keywords: cache excerpt
Cc:

Description

It is possible that there exists a disjoint between the contents of the index and the contents of the current sources. Due to this, a hit may be returned, yet not actually exist in a file.

If this is the case, then it would be nice to have the option of enabling caching, such that one could see the contents of the document as it existed when it was indexed.

Discussion in #trac

00:16 < pacopablo> is there a clean way of also returing a "dirty" flag of some sort
00:16 < pacopablo> indicating that the excerpt is from the indexed copy, but not in the current?
00:16 < alect> hrrm
00:17 < alect> good point
00:18 < pacopablo> then, kind of along the same thought, what about an optional Cache module/ability?
00:18 < pacopablo> obviously it would make the index ( or cache ) huge, but that's why it would be optional.
00:18 < pacopablo> unless you were planning of having the .indexed be a cached version of the page when it was indexed
00:19 < alect> .indexed would be whatever the Indexer can retrieve from its index
00:19 < pacopablo> so, then what about a cache module?
00:19 < alect> eg. in the case of BuiltinIndexer it would essentially just be an unordered soup of words
00:20 < pacopablo> so that one could produce the page as it was when indexed?
00:20 < alect> i think i could probably do that with a pseudo-indexer
00:20 < alect> cache://builtin:///some/path
00:21 < pacopablo> not super-high on need list, but an idea
00:21 < alect> it would overlay the other indexer
00:21 < alect> yeah
00:21 < alect> can you create a ticket for that?
00:21 < pacopablo> sure

Attachments

Change History

02/08/07 02:27:59 changed by athomas

  • status changed from new to assigned.

A pattern like this perhaps:

from pyndexter import *
from pyndexter.cache import Cache

framework = Framework(Cache('builtin:///tmp/index.idx'))

...

Where Cache is an Indexer that transparently proxies to another Indexer.

02/11/07 20:14:04 changed by athomas

The builtin indexer now has a cache=true parameter which will cache the full content of each document, as well its internal indexed representation.

08/14/07 22:15:55 changed by athomas

  • status changed from assigned to closed.
  • resolution set to fixed.

Add/Change #41 (Caching Documents at Time of Indexing)




Change Properties
Action