Changeset 395

Show
Ignore:
Timestamp:
02/15/07 00:39:39 (2 years ago)
Author:
athomas
Message:

pyndexter: Updated a lot of API documentation.

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • pyndexter/trunk/pyndexter/indexers/builtin.py

    r393 r395  
    88 
    99""" 
     10Builtin Indexer 
     11--------------- 
     12 
    1013The builtin Pyndexter indexer. 
    1114 
    12 Maintains an inverted index of word:uri and uri:word. 
     15Pyndexter provides a basic inverted index indexer. It does not currently 
     16support substring matching, wildcards, or scoring, but these features are 
     17planned. 
     18 
     19Usage 
     20~~~~~ 
     21 
     22:: 
     23 
     24    builtin://<path>?compact=<bool>&cache=<bool>&dbm=<dbm> 
     25 
     26``compact=<bool>`` (default: ``true``) 
     27    Whether to compact the database as much as possible. Slight slowdown. 
     28 
     29``cache=<bool>`` (default: ``false``) 
     30    Should we keep a cached copy of each document as it is indexed? 
     31 
     32``dbm=<dbm>`` (default: ``anydbm``) 
     33    Supported dbm's are ``anydbm``, ``dbhash``, ``gdbm`` and ``dbm`` (Python 2.5). 
     34 
     35Installation 
     36~~~~~~~~~~~~ 
     37 
     38No installation is required. Pyndexter uses the anydbm Python module for 
     39storage. 
    1340""" 
    1441 
     
    88115 
    89116class BuiltinIndexer(Indexer): 
    90     """Constructor URI is: 
    91  
    92         builtin://<path>/?dbm=<dbm>&cache=<bool>&compact=<bool> 
    93  
    94     Supported dbm's are `anydbm`, `dbhash`, `gdbm` and `dbm` (Python 2.5). 
    95     `anydbm` is the default. 
    96  
    97     If `cache` is specified, the full original text of each document will 
    98     be stored in the index (large). 
    99  
    100     If `compact` is True, each word in the index is given a numeric ID. 
    101     Currently this has a fairly large performance impact, but it does 
    102     reduce the size of the index considerably. 
    103  
    104     eg. 
    105  
    106         builtin:///tmp/builtin.idx?dbm=gdbm 
    107  
    108     """ 
     117    """Builtin Pyndexter indexer.""" 
    109118    def __init__(self, framework, path, dbm='anydbm', cache=False, 
    110                  compact=False): 
     119                 compact=true): 
    111120        Indexer.__init__(self, framework) 
    112121 
  • pyndexter/trunk/pyndexter/indexers/hype.py

    r387 r395  
    88 
    99""" 
    10 Adapter for Hyperestraier (http://hyperestraier.sourceforge.net/) using the 
    11 Hype bindings (http://hype.python-hosting.com/). 
     10Hype 
     11---- 
     12 
     13Adapter for Hyperestraier using the Hype bindings. 
     14 
     15Hype_ is a Python wrapper for Hyperestraier_. Hype is only available through 
     16SVN, but is quite stable and functional. 
     17 
     18.. _Hype: http://hype.python-hosting.com 
     19.. _Hyperestraier: http://hyperestraier.sourceforge.net/ 
     20 
     21Usage 
     22~~~~~ 
     23 
     24:: 
     25 
     26    hype://<path>?hype_mode=<int>&enable_scoring=<bool> 
     27 
     28 
     29``hype_mode`` (default: auto) 
     30    Override the default ``READONLY``/``READWRITE`` modes in Pyndexter and use 
     31    Hyperestraier database open modes. See the Hyperestraier docs for details. 
     32 
     33``enable_scoring`` (default: ``true``) 
     34    Put Hyperestraier into a debug mode where scores are returned. This is 
     35    apparently somewhat slower, but I have not observed a massive difference. 
     36 
     37Installation 
     38~~~~~~~~~~~~ 
     39 
     40Install your distributions Hyperestraier package. 
     41 
     42:: 
     43 
     44    svn co http://svn.hype.python-hosting.com/trunk hype 
     45    cd hype 
     46    python setup.py install 
    1247""" 
    1348 
  • pyndexter/trunk/pyndexter/indexers/hyperestraier.py

    r387 r395  
    88 
    99""" 
    10 Adapter for Hyperestraier using the swigged bindings 
    11 (http://hyperestraier.sourceforge.net/) 
     10Hyperestraier 
     11------------- 
     12 
     13Adapter for Hyperestraier_ using the swigged bindings. 
     14 
     15.. _Hyperestraier: http://hyperestraier.sourceforge.net/ 
     16 
     17Usage 
     18~~~~~ 
     19 
     20:: 
     21 
     22    hyperestraier://<path>?hype_mode=<int> 
     23 
     24``hype_mode`` (default: auto) 
     25    Override the default ``READONLY``/``READWRITE`` modes in Pyndexter and use 
     26    Hyperestraier database open modes. See the Hyperestraier docs for details. 
     27 
     28Installation 
     29~~~~~~~~~~~~ 
     30 
     31Install your distributions Hyperestraier package (typically the package 
     32``hyperestraier``). 
     33 
     34If your distribution also includes the SWIG bindings as packages, install 
     35these, otherwise: 
     36 
     37:: 
     38 
     39    wget http://hyperestraier.sourceforge.net/binding/hyper_estraier_wrappers-0.0.15.tar.gz 
     40    tar xfzv hyper_estraier_wrappers-0.0.15.tar.gz 
     41    cd hyper_estraier_wrappers-0.0.15 
     42    make 
     43    make install 
    1244""" 
    1345 
  • pyndexter/trunk/pyndexter/indexers/lucene.py

    r387 r395  
    66# you should have received as part of this distribution. 
    77# 
     8 
     9""" 
     10Lucene 
     11------ 
     12 
     13The Lucene adapter relies on PyLucene_, which is a Swig interface to a gcj 
     14compiled version of Java Lucene. 
     15 
     16PyLucene is good, but there are some serious compatibility issues with Python 
     17threading due to Java threading wanting to be the only implementation running. 
     18 
     19Usage 
     20~~~~~ 
     21 
     22:: 
     23 
     24    lucene://<path> 
     25 
     26Installation 
     27~~~~~~~~~~~~ 
     28 
     29PyLucene_ is quite difficult to install. Either use your distributions 
     30packaging system or, if you're brave, attempt a source installation. Beyond the 
     31scope of this hint. 
     32 
     33.. _PyLucene: http://pylucene.osafoundation.org/ 
     34 
     35""" 
    836 
    937import os 
  • pyndexter/trunk/pyndexter/indexers/lupy.py

    r387 r395  
    88 
    99""" 
    10 Adapter for the deprecated, but still available from 
    11 http://gentoo.prz.rzeszow.pl/distfiles/Lupy-0.2.1.tar.gz, Lupy indexer. 
     10Lupy 
     11---- 
     12 
     13Lupy_ is a (deprecated) pure-Python indexer. It is excruciatingly slow, 
     14presumably because of its desire to be compatible with Lucene. Included 
     15as an excercise mostly :) 
     16 
     17.. _Lupy: http://www.divmod.org/projects/lupy 
     18 
     19Usage 
     20~~~~~ 
     21 
     22:: 
     23 
     24    lupy://<path> 
     25 
     26Installation 
     27~~~~~~~~~~~~ 
     28 
     29:: 
     30 
     31    easy_install http://gentoo.prz.rzeszow.pl/distfiles/Lupy-0.2.1.tar.gz 
    1232""" 
    1333 
  • pyndexter/trunk/pyndexter/indexers/mock.py

    r393 r395  
    66# you should have received as part of this distribution. 
    77# 
     8 
     9""" 
     10Memory-only indexer used primarily for unit testing. Takes no options. 
     11""" 
    812 
    913from StringIO import StringIO 
  • pyndexter/trunk/pyndexter/indexers/pyndex.py

    r392 r395  
    66# you should have received as part of this distribution. 
    77# 
     8 
     9""" 
     10Pyndex 
     11------ 
     12 
     13Pyndex_ is a pure-Python indexer written 
     14by the busy Divmod folks. It is quite fast, but again, no longer supported. 
     15 
     16**Note:** Pyndex does not support document deletion. I have hacked around this 
     17by inserting an empty document but this is obviously not ideal. 
     18 
     19.. _Pyndex: http://www.divmod.org/projects/pyndex 
     20 
     21Usage 
     22~~~~~ 
     23 
     24:: 
     25 
     26    pyndex://<path> 
     27 
     28Installation 
     29~~~~~~~~~~~~ 
     30 
     31:: 
     32 
     33    easy_install http://downloads.sourceforge.net/pyndex/Pyndex-0.3.2a.tar.gz 
     34""" 
    835 
    936import os 
  • pyndexter/trunk/pyndexter/indexers/swishe.py

    r380 r395  
    1111 
    1212""" 
    13 Search-only adapter for Swish-e, via the SwishE Python module (which doesn't 
    14 appear to support indexing?) 
     13Swish-e 
     14------- 
     15 
     16`Siwsh-e <http://swish-e.org/>`_ is a popular indexer, typically used for internal web sites. 
     17 
     18This is a search-only adapter, implemented via the SwishE_ Python module (which 
     19doesn't appear to support indexing?). Indexing still has to be performed by 
     20Swish-e itself. 
     21 
     22Usage 
     23~~~ 
     24 
     25:: 
     26 
     27    swishe://<path> 
     28 
     29Installation 
     30~~~~~~~~~~ 
     31 
     32:: 
     33 
     34    easy_install SwishE 
     35 
     36.. _SwishE: http://jibe.freeshell.org/bits/SwishE/ 
    1537""" 
    1638 
  • pyndexter/trunk/pyndexter/indexers/xapian.py

    r387 r395  
    88 
    99""" 
    10 Adapter for Xapian (http://www.xapian.org) 
     10Xapian 
     11------ 
     12 
     13Adapter for `Xapian <http://www.xapian.org>`_, a fast full-text indexing 
     14engine. 
     15 
     16Usage 
     17~~~~~ 
     18 
     19:: 
     20 
     21    xapian://<path> 
     22 
     23Installation 
     24~~~~~~~~~~~~ 
     25 
     26Install Xapian for your distribution (typically the package ``xapian-core``). 
     27 
     28If your distribution also includes the SWIG bindings, install these, otherwise: 
     29 
     30:: 
     31 
     32    wget http://www.oligarchy.co.uk/xapian/0.9.9/xapian-bindings-0.9.9.tar.gz 
     33    tar xfzv xapian-bindings-0.9.9.tar.gz 
     34    cd xapian-bindings-0.9.9 
     35    ./configure 
     36    make 
     37    make install 
    1138""" 
    1239 
  • pyndexter/trunk/pyndexter/sources/file.py

    r387 r395  
    88 
    99""" 
    10 A document source for local filesystem. Accepts three optional arguments: 
     10File Source 
     11----------- 
    1112 
    12     include=<glob> 
    13     exclude=<glob> 
    14     predicate=<function> 
     13A document source for local filesystem. 
    1514 
    16 Any files not excluded by the exclude pattern and included by the include 
    17 pattern will match. 
     15The file source watches a path for changes in files matching a set of 
     16include/exclude patterns. 
     17 
     18Usage 
     19~~~~~ 
     20 
     21:: 
     22 
     23    file://<path>?include=<glob>&exclude=<glob> 
     24 
     25``include=<glob>`` (default: ``*``) 
     26    Multiple include globs can be provided. Specifies which files should be 
     27    included in the index. 
     28 
     29``exclude=<glob>`` 
     30    Multiple exclude globs can be provided. Specifies which files should be 
     31    excluded from the index, even if they would otherwise match. 
     32 
     33Each file under ``<path>`` is first matched against the includes, then against 
     34excludes. If neither match, the file is not included. 
    1835""" 
    1936 
  • pyndexter/trunk/pyndexter/sources/__init__.py

    r374 r395  
    66# you should have received as part of this distribution. 
    77# 
     8 
     9""" 
     10A Source is an object that is able to list and fetch a set of documents. It can 
     11typically also determine when a document under its domain needs to be 
     12reindexed. 
     13""" 
  • pyndexter/trunk/pyndexter/sources/metasource.py

    r374 r395  
    66# you should have received as part of this distribution. 
    77# 
     8 
     9""" 
     10The MetaSource is used internally by the Pyndexter framework. 
     11""" 
    812 
    913import pickle 
  • pyndexter/trunk/pyndexter/sources/mock.py

    r392 r395  
    66# you should have received as part of this distribution. 
    77# 
     8 
     9""" 
     10Used by the Pyndexter unit tests. 
     11""" 
    812 
    913from pyndexter import * 
  • pyndexter/trunk/pyndexter/stemmers/__init__.py

    r377 r395  
    66# you should have received as part of this distribution. 
    77# 
     8 
     9""" 
     10`Stemming <http://en.wikipedia.org/wiki/Stemmer>`_ is a process for reducing 
     11variants of a root word to that word. 
     12 
     13Pyndexter ships with a builtin English stemmer based on the Porter algorithm, 
     14but has an adapter for Snowball, a comprehensive multi-lingual stemmer. 
     15""" 
  • pyndexter/trunk/pyndexter/stemmers/porter.py

    r377 r395  
     1# -*- coding: utf-8 -*- 
     2# 
     3# This software is licensed as described in the file COPYING, which 
     4# you should have received as part of this distribution. 
     5# 
     6 
    17"""Porter Stemming Algorithm 
     8 
    29This is the Porter stemming algorithm, ported to Python from the 
    310version coded up in ANSI C by the author. It may be be regarded 
  • pyndexter/trunk/pyndexter/stemmers/snowball.py

    r377 r395  
    66# you should have received as part of this distribution. 
    77# 
     8 
     9""" 
     10Snowball 
     11-------- 
     12 
     13`Snowball <http://snowball.tartarus.org/>`_ is a multi-language stemming 
     14library with `Python bindings <http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz>`_. 
     15 
     16Usage 
     17~~~~~ 
     18 
     19:: 
     20 
     21    snowball://<language> 
     22 
     23``<language>`` 
     24    Any of the languages supported by Snowball. 
     25 
     26 
     27Installation 
     28~~~~~~~~~~~~ 
     29 
     30The Python bindings ship with the Snowball source, so it's an easy (and 
     31recommended) install. 
     32 
     33:: 
     34 
     35    easy_install http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz 
     36 
     37""" 
    838 
    939import Stemmer