Changeset 395
- Timestamp:
- 02/15/07 00:39:39 (2 years ago)
- Files:
-
- pyndexter/trunk/pyndexter/indexers/builtin.py (modified) (2 diffs)
- pyndexter/trunk/pyndexter/indexers/hype.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/indexers/hyperestraier.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/indexers/lucene.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/indexers/lupy.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/indexers/mock.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/indexers/pyndex.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/indexers/swishe.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/indexers/xapian.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/sources/file.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/sources/__init__.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/sources/metasource.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/sources/mock.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/stemmers/__init__.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/stemmers/porter.py (modified) (1 diff)
- pyndexter/trunk/pyndexter/stemmers/snowball.py (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
pyndexter/trunk/pyndexter/indexers/builtin.py
r393 r395 8 8 9 9 """ 10 Builtin Indexer 11 --------------- 12 10 13 The builtin Pyndexter indexer. 11 14 12 Maintains an inverted index of word:uri and uri:word. 15 Pyndexter provides a basic inverted index indexer. It does not currently 16 support substring matching, wildcards, or scoring, but these features are 17 planned. 18 19 Usage 20 ~~~~~ 21 22 :: 23 24 builtin://<path>?compact=<bool>&cache=<bool>&dbm=<dbm> 25 26 ``compact=<bool>`` (default: ``true``) 27 Whether to compact the database as much as possible. Slight slowdown. 28 29 ``cache=<bool>`` (default: ``false``) 30 Should we keep a cached copy of each document as it is indexed? 31 32 ``dbm=<dbm>`` (default: ``anydbm``) 33 Supported dbm's are ``anydbm``, ``dbhash``, ``gdbm`` and ``dbm`` (Python 2.5). 34 35 Installation 36 ~~~~~~~~~~~~ 37 38 No installation is required. Pyndexter uses the anydbm Python module for 39 storage. 13 40 """ 14 41 … … 88 115 89 116 class BuiltinIndexer(Indexer): 90 """Constructor URI is: 91 92 builtin://<path>/?dbm=<dbm>&cache=<bool>&compact=<bool> 93 94 Supported dbm's are `anydbm`, `dbhash`, `gdbm` and `dbm` (Python 2.5). 95 `anydbm` is the default. 96 97 If `cache` is specified, the full original text of each document will 98 be stored in the index (large). 99 100 If `compact` is True, each word in the index is given a numeric ID. 101 Currently this has a fairly large performance impact, but it does 102 reduce the size of the index considerably. 103 104 eg. 105 106 builtin:///tmp/builtin.idx?dbm=gdbm 107 108 """ 117 """Builtin Pyndexter indexer.""" 109 118 def __init__(self, framework, path, dbm='anydbm', cache=False, 110 compact= False):119 compact=true): 111 120 Indexer.__init__(self, framework) 112 121 pyndexter/trunk/pyndexter/indexers/hype.py
r387 r395 8 8 9 9 """ 10 Adapter for Hyperestraier (http://hyperestraier.sourceforge.net/) using the 11 Hype bindings (http://hype.python-hosting.com/). 10 Hype 11 ---- 12 13 Adapter for Hyperestraier using the Hype bindings. 14 15 Hype_ is a Python wrapper for Hyperestraier_. Hype is only available through 16 SVN, but is quite stable and functional. 17 18 .. _Hype: http://hype.python-hosting.com 19 .. _Hyperestraier: http://hyperestraier.sourceforge.net/ 20 21 Usage 22 ~~~~~ 23 24 :: 25 26 hype://<path>?hype_mode=<int>&enable_scoring=<bool> 27 28 29 ``hype_mode`` (default: auto) 30 Override the default ``READONLY``/``READWRITE`` modes in Pyndexter and use 31 Hyperestraier database open modes. See the Hyperestraier docs for details. 32 33 ``enable_scoring`` (default: ``true``) 34 Put Hyperestraier into a debug mode where scores are returned. This is 35 apparently somewhat slower, but I have not observed a massive difference. 36 37 Installation 38 ~~~~~~~~~~~~ 39 40 Install your distributions Hyperestraier package. 41 42 :: 43 44 svn co http://svn.hype.python-hosting.com/trunk hype 45 cd hype 46 python setup.py install 12 47 """ 13 48 pyndexter/trunk/pyndexter/indexers/hyperestraier.py
r387 r395 8 8 9 9 """ 10 Adapter for Hyperestraier using the swigged bindings 11 (http://hyperestraier.sourceforge.net/) 10 Hyperestraier 11 ------------- 12 13 Adapter for Hyperestraier_ using the swigged bindings. 14 15 .. _Hyperestraier: http://hyperestraier.sourceforge.net/ 16 17 Usage 18 ~~~~~ 19 20 :: 21 22 hyperestraier://<path>?hype_mode=<int> 23 24 ``hype_mode`` (default: auto) 25 Override the default ``READONLY``/``READWRITE`` modes in Pyndexter and use 26 Hyperestraier database open modes. See the Hyperestraier docs for details. 27 28 Installation 29 ~~~~~~~~~~~~ 30 31 Install your distributions Hyperestraier package (typically the package 32 ``hyperestraier``). 33 34 If your distribution also includes the SWIG bindings as packages, install 35 these, otherwise: 36 37 :: 38 39 wget http://hyperestraier.sourceforge.net/binding/hyper_estraier_wrappers-0.0.15.tar.gz 40 tar xfzv hyper_estraier_wrappers-0.0.15.tar.gz 41 cd hyper_estraier_wrappers-0.0.15 42 make 43 make install 12 44 """ 13 45 pyndexter/trunk/pyndexter/indexers/lucene.py
r387 r395 6 6 # you should have received as part of this distribution. 7 7 # 8 9 """ 10 Lucene 11 ------ 12 13 The Lucene adapter relies on PyLucene_, which is a Swig interface to a gcj 14 compiled version of Java Lucene. 15 16 PyLucene is good, but there are some serious compatibility issues with Python 17 threading due to Java threading wanting to be the only implementation running. 18 19 Usage 20 ~~~~~ 21 22 :: 23 24 lucene://<path> 25 26 Installation 27 ~~~~~~~~~~~~ 28 29 PyLucene_ is quite difficult to install. Either use your distributions 30 packaging system or, if you're brave, attempt a source installation. Beyond the 31 scope of this hint. 32 33 .. _PyLucene: http://pylucene.osafoundation.org/ 34 35 """ 8 36 9 37 import os pyndexter/trunk/pyndexter/indexers/lupy.py
r387 r395 8 8 9 9 """ 10 Adapter for the deprecated, but still available from 11 http://gentoo.prz.rzeszow.pl/distfiles/Lupy-0.2.1.tar.gz, Lupy indexer. 10 Lupy 11 ---- 12 13 Lupy_ is a (deprecated) pure-Python indexer. It is excruciatingly slow, 14 presumably because of its desire to be compatible with Lucene. Included 15 as an excercise mostly :) 16 17 .. _Lupy: http://www.divmod.org/projects/lupy 18 19 Usage 20 ~~~~~ 21 22 :: 23 24 lupy://<path> 25 26 Installation 27 ~~~~~~~~~~~~ 28 29 :: 30 31 easy_install http://gentoo.prz.rzeszow.pl/distfiles/Lupy-0.2.1.tar.gz 12 32 """ 13 33 pyndexter/trunk/pyndexter/indexers/mock.py
r393 r395 6 6 # you should have received as part of this distribution. 7 7 # 8 9 """ 10 Memory-only indexer used primarily for unit testing. Takes no options. 11 """ 8 12 9 13 from StringIO import StringIO pyndexter/trunk/pyndexter/indexers/pyndex.py
r392 r395 6 6 # you should have received as part of this distribution. 7 7 # 8 9 """ 10 Pyndex 11 ------ 12 13 Pyndex_ is a pure-Python indexer written 14 by the busy Divmod folks. It is quite fast, but again, no longer supported. 15 16 **Note:** Pyndex does not support document deletion. I have hacked around this 17 by inserting an empty document but this is obviously not ideal. 18 19 .. _Pyndex: http://www.divmod.org/projects/pyndex 20 21 Usage 22 ~~~~~ 23 24 :: 25 26 pyndex://<path> 27 28 Installation 29 ~~~~~~~~~~~~ 30 31 :: 32 33 easy_install http://downloads.sourceforge.net/pyndex/Pyndex-0.3.2a.tar.gz 34 """ 8 35 9 36 import os pyndexter/trunk/pyndexter/indexers/swishe.py
r380 r395 11 11 12 12 """ 13 Search-only adapter for Swish-e, via the SwishE Python module (which doesn't 14 appear to support indexing?) 13 Swish-e 14 ------- 15 16 `Siwsh-e <http://swish-e.org/>`_ is a popular indexer, typically used for internal web sites. 17 18 This is a search-only adapter, implemented via the SwishE_ Python module (which 19 doesn't appear to support indexing?). Indexing still has to be performed by 20 Swish-e itself. 21 22 Usage 23 ~~~ 24 25 :: 26 27 swishe://<path> 28 29 Installation 30 ~~~~~~~~~~ 31 32 :: 33 34 easy_install SwishE 35 36 .. _SwishE: http://jibe.freeshell.org/bits/SwishE/ 15 37 """ 16 38 pyndexter/trunk/pyndexter/indexers/xapian.py
r387 r395 8 8 9 9 """ 10 Adapter for Xapian (http://www.xapian.org) 10 Xapian 11 ------ 12 13 Adapter for `Xapian <http://www.xapian.org>`_, a fast full-text indexing 14 engine. 15 16 Usage 17 ~~~~~ 18 19 :: 20 21 xapian://<path> 22 23 Installation 24 ~~~~~~~~~~~~ 25 26 Install Xapian for your distribution (typically the package ``xapian-core``). 27 28 If your distribution also includes the SWIG bindings, install these, otherwise: 29 30 :: 31 32 wget http://www.oligarchy.co.uk/xapian/0.9.9/xapian-bindings-0.9.9.tar.gz 33 tar xfzv xapian-bindings-0.9.9.tar.gz 34 cd xapian-bindings-0.9.9 35 ./configure 36 make 37 make install 11 38 """ 12 39 pyndexter/trunk/pyndexter/sources/file.py
r387 r395 8 8 9 9 """ 10 A document source for local filesystem. Accepts three optional arguments: 10 File Source 11 ----------- 11 12 12 include=<glob> 13 exclude=<glob> 14 predicate=<function> 13 A document source for local filesystem. 15 14 16 Any files not excluded by the exclude pattern and included by the include 17 pattern will match. 15 The file source watches a path for changes in files matching a set of 16 include/exclude patterns. 17 18 Usage 19 ~~~~~ 20 21 :: 22 23 file://<path>?include=<glob>&exclude=<glob> 24 25 ``include=<glob>`` (default: ``*``) 26 Multiple include globs can be provided. Specifies which files should be 27 included in the index. 28 29 ``exclude=<glob>`` 30 Multiple exclude globs can be provided. Specifies which files should be 31 excluded from the index, even if they would otherwise match. 32 33 Each file under ``<path>`` is first matched against the includes, then against 34 excludes. If neither match, the file is not included. 18 35 """ 19 36 pyndexter/trunk/pyndexter/sources/__init__.py
r374 r395 6 6 # you should have received as part of this distribution. 7 7 # 8 9 """ 10 A Source is an object that is able to list and fetch a set of documents. It can 11 typically also determine when a document under its domain needs to be 12 reindexed. 13 """ pyndexter/trunk/pyndexter/sources/metasource.py
r374 r395 6 6 # you should have received as part of this distribution. 7 7 # 8 9 """ 10 The MetaSource is used internally by the Pyndexter framework. 11 """ 8 12 9 13 import pickle pyndexter/trunk/pyndexter/sources/mock.py
r392 r395 6 6 # you should have received as part of this distribution. 7 7 # 8 9 """ 10 Used by the Pyndexter unit tests. 11 """ 8 12 9 13 from pyndexter import * pyndexter/trunk/pyndexter/stemmers/__init__.py
r377 r395 6 6 # you should have received as part of this distribution. 7 7 # 8 9 """ 10 `Stemming <http://en.wikipedia.org/wiki/Stemmer>`_ is a process for reducing 11 variants of a root word to that word. 12 13 Pyndexter ships with a builtin English stemmer based on the Porter algorithm, 14 but has an adapter for Snowball, a comprehensive multi-lingual stemmer. 15 """ pyndexter/trunk/pyndexter/stemmers/porter.py
r377 r395 1 # -*- coding: utf-8 -*- 2 # 3 # This software is licensed as described in the file COPYING, which 4 # you should have received as part of this distribution. 5 # 6 1 7 """Porter Stemming Algorithm 8 2 9 This is the Porter stemming algorithm, ported to Python from the 3 10 version coded up in ANSI C by the author. It may be be regarded pyndexter/trunk/pyndexter/stemmers/snowball.py
r377 r395 6 6 # you should have received as part of this distribution. 7 7 # 8 9 """ 10 Snowball 11 -------- 12 13 `Snowball <http://snowball.tartarus.org/>`_ is a multi-language stemming 14 library with `Python bindings <http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz>`_. 15 16 Usage 17 ~~~~~ 18 19 :: 20 21 snowball://<language> 22 23 ``<language>`` 24 Any of the languages supported by Snowball. 25 26 27 Installation 28 ~~~~~~~~~~~~ 29 30 The Python bindings ship with the Snowball source, so it's an easy (and 31 recommended) install. 32 33 :: 34 35 easy_install http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz 36 37 """ 8 38 9 39 import Stemmer
