Pyndexter - A full text indexing/search abstraction layer for Python

Pyndexter (pronounced 'poindexter') is an abstraction layer for full-text indexing engines. It presents a uniform query syntax to the user, includes a basic but functional pure-Python indexer, and has adapters for Hype, Hyperestraier, Lucene, Lupy, Pyndex, Swish-e and Xapian

How do I install it?

Pyndexter should be installable with setuptools:

easy_install pyndexter

Where can I get it?

Releases of pyndexter can be obtained here:

0.2 (major API changes)

pyndexter-0.2.tar.gz (5c0240a7dc1ca9d105c338f27cfa116d)
pyndexter-0.2-py2.4.egg (bc2a61eea717c4ea8fd6f322a262a5f7)

0.1

pyndexter-0.1.tar.gz (b88658482d92aae72a0d613bdf4b25a8)
pyndexter-0.1-py2.4.egg (e5ad04af5b6907bbadac5ff6a60e9252)

Source

You can obtain a ZIP of the current source tree from here, check out the source with Subversion like so:

svn co http://swapoff.org/svn/pyndexter/trunk pyndexter-trunk

Or finally, browse the source with Trac.

Documentation

Refer to the documentation index.

What's left to do?

There is a long list of tasks in a Dev Todo list.

Problems? Contributions?

If you find a bug or want to contribute, please create a ticket.

You might also want to check existing tickets to see if anybody else has had the same problem.

Performance

Not exactly exhaustive, but here are some completely unscientific timings for some of the indexers available under 0.2:

Indexing performed on 308 IRC .log files in ./#trac, totalling 16472KB.

IndexerInitial Index Time (seconds)Index size (KB )
Builtin224s17460KB
Hype36s20208KB
Hyperestraier28s20208KB
Lucene48s4544KB
Xapian65s28128KB

History

Pyndexter was originally inspired by the need for full text indexing for the Trac repository search plugin. I wrote my own indexer in pure Python for the plugin, but for large repositories the indexing speed was sub-optimal.

Change Log

[486] by athomas on 12/11/07 21:02:26

pyndexter: Fix for port parsing bug in util.URI.

[466] by athomas on 11/26/07 08:09:26

pyndexter: Documents are now referenced by a user-defined key rather than a URI. It was a needless restriction. Fixed hyperestraier unit tests up a bit.

[465] by athomas on 11/26/07 07:40:47

pyndexter: More work on the simplification refactoring branch.

[459] by athomas on 08/24/07 00:44:47

pyndexter: Re-added stemmers and added an entry_point for them.

[458] by athomas on 08/23/07 23:26:25

pyndexter: More cleanup for new branch.

[457] by athomas on 08/23/07 22:21:21

pyndexter: Initial commit of simplification refactoring branch.

[456] by athomas on 08/22/07 07:56:24

pyndexter: Skip unsupported NOT query test for builtin indexer.

[454] by athomas on 08/22/07 06:47:37

pyndexter: Whoops.

[453] by athomas on 08/22/07 06:42:12

pyndexter: Moved to hyperestraier pure-Python module, fixed setup.py.

[452] by athomas on 08/21/07 08:28:15

pyndexter: All modules are now prefixed with _ to avoid import collisions. Updated unit tests.

[451] by athomas on 08/21/07 07:25:45

pyndexter: Added sub-expression and attr:value support to the query parser.