Posts in category python

Emulating !ActionScript's "with" statement in Python

If you haven't had much to do with ActionScript or Visual Basic, right now you might be thinking, as I did, "huh?" Basically the with statement in both of these languages allows one to access the attributes of an object as if they were in the local scope.

I know, I know. Why would you want to do this? I have no idea, but I thought it was cool that it was even possible in Python. However, there are some fairly major limitations:

  • It only works in a global scope (I neglected to mention this originally, apologies - see #74 for details). One can s/f_locals/f_globals/g but this introduces its own problems.
  • It only works for objects with a mutable __dict__.

I also thought the approach of injecting symbols into the context could be useful for providing DSL-like features. eg.

with query('a', 'b', 'c') as result:
    SELECT(a.foo, b.bar, c.baz)
    WHERE(AND(a.foo == 10, b.bar == 20))
    ORDER_BY(a.foo)
    GROUP_BY(b.bar)
    for row in result:
        print row.foo, row.bar, row.baz

etc.

But enough speculation, here is an actual example. It requires at least Python 2.5.

Error: Failed to load processor pycon
No macro or processor named 'pycon' found

And finally, here's the actual code.

from __future__ import with_statement
import inspect


class scope(object):
    """A context that maps an objects attributes into the current lexical
    scope, only for the duration of the context.

    NB. Properties will NOT work.

    >>> import math
    >>> with scope(math):
    ...     print sqrt(9)
    3.0

    >>> class Vector:
    ...     def __init__(self, x, y):
    ...         self.x = x
    ...         self.y = y
    ...     def length(self):
    ...         return math.sqrt(self.x ** 2 + self.y ** 2)
    >>> v = Vector(3, 3)
    >>> with scope(v):
    ...     print x, '%0.2f' % length()
    ...     x = 4
    ...     print x, '%0.2f' % length()
    3 4.24
    4 5.00
    >>> print v.x, v.y, v.length()
    4 3 5.0
    >>> print x
    Traceback (most recent call last):
    ...
    NameError: name 'x' is not defined
    """
    def __init__(self, obj):
        self.members = dict(inspect.getmembers(obj))
        self.obj = obj

    def __enter__(self):
        parent = inspect.currentframe().f_back
        # Preserve parent locals() and the objects dictionaries
        self.old_locals = dict(parent.f_locals)
        self.old_dict = self.obj.__dict__
        # Update locals with object members
        parent.f_locals.update(self.members)
        # If possible, replace the object's __dict__ with our updated locals.
        # Functions on the object that operate on the attributes will then
        # continue to work as expected. The downside is that all other objects
        # in the parents scope will be included as well.
        try:
            self.obj.__dict__ = parent.f_locals
            self.fake_dict = True
        except (TypeError, AttributeError):
            self.fake_dict = False

    def __exit__(self, type, value, traceback):
        parent = inspect.currentframe().f_back
        if self.fake_dict:
            object_dict = self.old_dict
        else:
            object_dict = self.obj.__dict__
        # Replicate member state from locals to object's __dict__.
        for key in self.members:
            if key in parent.f_locals:
                object_dict[key] = parent.f_locals[key]
                if key not in self.old_locals:
                    del parent.f_locals[key]
            else:
                del object_dict[key]
        if self.fake_dict:
            self.obj.__dict__ = object_dict
        del self.old_locals
        del self.old_dict
        return False


if __name__ == '__main__':
    import doctest
    doctest.testmod()

Walking a py3k AST

The compiler module has been removed from py3k. Fortunately there's a replacement in the _ast module introduced in Python 2.5. Unfortunately, while the compiler module has a useful __repr__esentation for its AST objects...

Error: Failed to load processor pycon
No macro or processor named 'pycon' found

This is not the case for the _ast module, so here's a function that will dump _ast.AST objects as a dict:

Error: Failed to load processor pycon
No macro or processor named 'pycon' found

Type masquerading using dynamic base classes

Python allows us to construct new classes on-the-fly with type(). By (ab)using this feature we can construct a class that preserves its own interface while assuming all of the behaviour of an existing object. Useful for wrapping compound types such as dictionaries, lists, sets, etc. without having to proxy __getitem__, __iter__ and so on, or write a custom __getattr__.

One example of when this could be useful is a JSON-specific HTTP client object that assumes the type of the JSON data, while maintaining response-specific attributes such as headers and status:

import simplejson


class Response(object):
  """A response object that masquerades as the decoded content type."""
  def __new__(cls, content=None, headers=None, status=None):
    if content is not None:
      content = simplejson.loads(content)
      assumed_type = type(content)

      bases = (Response, assumed_type)
      name = assumed_type.__name__.title() + 'Response'
      cls = type(name, bases, {})

      self = assumed_type.__new__(cls, content)
      self.assumed_type = assumed_type
    else:
      self = object.__new__(cls)
    return self

  def __init__(self, content=None, headers=None, status=None):
    if content is not None:
      super(Response, self).__init__(content)
    self.headers = headers
    self.status = status


json_data = [
  '{"foo": 1, "bar": 2}',
  '123',
  '123.5',
  '["foo", "bar"]',
  ]


for data in json_data:
  response = Response(data, headers=[('Content-Type', 'application/json')],
                      status=200)
  decoded = simplejson.loads(data)
  print response
  print '  Same type?', isinstance(response, type(decoded))
  try:
    print '  Iteration:',
    print [i for i in response]
  except TypeError:
    print '(type does not support iteration)'
  print '  Headers:', response.headers
  print '  Status:', response.status
  print

Outputs this:

{u'foo': 1, u'bar': 2}
  Same type? True
  Iteration: [u'foo', u'bar']
  Headers: [('Content-Type', 'application/json')]
  Status: 200

123
  Same type? True
  Iteration: (type does not support iteration)
  Headers: [('Content-Type', 'application/json')]
  Status: 200

123.5
  Same type? True
  Iteration: (type does not support iteration)
  Headers: [('Content-Type', 'application/json')]
  Status: 200

[u'foo', u'bar']
  Same type? True
  Iteration: [u'foo', u'bar']
  Headers: [('Content-Type', 'application/json')]
  Status: 200

CLY 0.9 released

CLY is a project I've been working on for a while now, and I've finally gotten around to releasing it.

It's basically a CLI parser and grammar constructor that lets you easily add command-line interfaces to your applications:

echo.py:

from cly import *

def echo(text):
    print text

grammar = Grammar(
    echo=Node(help='Echo text')(
        text=Variable(help='Text to echo', pattern=r'.+')(
            Action(callback=echo),
            ),
        ),
    )

interact(grammar)

CLY automatically generates contextual help and provides tab completion (fully customisable). If you ran the above program it would work like this:

cly> ?
  echo Echo text
cly> echo
         ^ more input required (expected <text>)
cly> echo ?
  <text> Text to echo
cly> echo some text
some text

Grammars can also be defined in XML. Here's the above example rewritten to use an XML grammar:

echo.xml:

<?xml version="1.0"?>
<grammar xmlns="http://swapoff.org/cly/xml">
  <node name="echo" help="Echo text">
    <variable name="text" pattern=".+" help="Text to echo">
      <action callback="echo"/>
    </variable>
  </node>
</grammar>

echo.py:

from cly import *

def echo(text):
    print text

grammar = Grammar.from_xml(open('echo.xml').read(), echo=echo)

interact(grammar)

More examples are available in the tutorial, developers guide and the API documentation.

Adding support for a workingenv sandbox to setuptools/distutils

It seems that all I blog about is workingenv.

This time it's a snippet of code that adds a "sandbox" command to distutils, which automatically creates a workingenv from the setuptools extras_require, install_requires and dependency_links options listed in your setup() call. It only supports *nix systems for now, but could easily be extended to support Windows.

Here's the code:

import os
from setuptools import setup, find_packages
from distutils.cmd import Command

class sandbox(Command):
    description = 'Create a development sandbox using workingenv'
    user_options = [
        ('path=', None, 'workingenv path'),
        ('extras', None, 'also include "extras" requirements'),
        ]

    def initialize_options(self):
        self.path = 'wenv'
        self.extras = False

    def finalize_options(self):
        pass

    def run(self):
        requires = open('requirements.txt', 'w')
        try:
            requirements = self.distribution.dependency_links + \
                           self.distribution.install_requires
            if self.extras:
                extras = self.distribution.extras_require or {}
                requirements += extras.values()
            requires.write('\n'.join(requirements))
        finally:
            requires.close()
        cwd = os.getcwd()
        import workingenv
        workingenv.main(['--always-unzip', '--requirements=requirements.txt',
                         '--site-packages', '--verbose', self.path])
        os.chdir(cwd)
        os.symlink(self.path + '/bin/activate', 'sandbox')
        print
        print 'XXX: Use ". sandbox" to activate the development sandbox'

setup(
    name='MyCoolPackage',
    version='0.0.0.1',
    packages=find_packages(),
    # Add the sandbox command
    cmdclass={'sandbox': sandbox},
    # Search some extra locations for dependencies
    dependency_links=[
        'http://svn.edgewall.org/repos/genshi/trunk#egg=Genshi-dev',
        'http://trac.pocoo.org/repos/werkzeug/trunk#egg=Werkzeug-dev',
        'http://svn.sqlalchemy.org/sqlalchemy/trunk#egg=SQLAlchemy-dev',
    ],
    install_requires=[
        'setuptools >= 0.6b1',
        'Genshi >= 0.5.dev-r698,==dev',
        'Werkzeug >= 0.1.dev-r3831,==dev',
        'SQLAlchemy >= 0.4.0.dev-r3203,==dev',
        'AuthKit >= 0.3.0pre5',
    ],
)

And here's an example of how to use it:

$ python setup.py sandbox --help
Common commands: (see '--help-commands' for more)

...

Options for 'sandbox' command:
  --path    workingenv path
  --extras  also include "extras" requirements

...
$ python setup.py sandbox --path=mysandbox --extras
running sandbox
Reading requirement requirements.txt
Making working environment in /home/athomas/p/test/mysandbox
Creating lib/python2.5

...

...Installing http://svn.edgewall.org/repos/genshi/trunk#egg=Genshi-dev,
http://trac.pocoo.org/repos/werkzeug/trunk#egg=Werkzeug-dev,
http://svn.sqlalchemy.org/sqlalchemy/trunk#egg=SQLAlchemy-dev,
setuptools >= 0.6b1, Genshi >= 0.5.dev-r698,==dev, Werkzeug >=
0.1.dev-r3831,==dev, SQLAlchemy >= 0.4.0.dev-r3203,==dev, AuthKit >= 0.3.0pre5
...done.

XXX: Use ". sandbox" to activate the development sandbox

April !SyPy Presentation on !PyCon

This months SyPy meeting attracted a larger number of people than usual, around 35 or so (compared to the usual 5 or 6).

I gave a talk about my trip to PyCon, which was received fairly well I think (I judge it a success by not being booed off the stage). S5 version is here.

Andrew Bennets gave a pretty interesting talk on Bazaar, although a lot of it was an introduction to distributed VCSes in general. One interesting aspect of Bazaar in particular, is the plugin system.

Activating a `workingenv` from Python

It can, under some circumstances, be useful to be able to activate a workingenv from Python. Here's a quick function to achieve that:

import sys
import os

def activate_workingenv(root):
    """Make modules in a self-contained workingenv available."""

    # Add ./bin directory to path.
    bin_dir = os.path.join(root, './bin')
    try:
        os.environ['PATH'] = os.path.pathsep.join([bin_dir, os.environ['PATH']]) 
    except KeyError:
        os.environ['PATH'] = bin_dir

    # Add ./lib to linker path
    lib_dir = os.path.join(root, './lib')
    try:
        os.environ['LD_LIBRARY_PATH'] = \
            os.path.pathsep.join([lib_dir, os.environ['LD_LIBRARY_PATH']])
    except KeyError:
        os.environ['LD_LIBRARY_PATH'] = lib_dir

    # Find the workingenv Python package root
    python_version = '.'.join(map(str, sys.version_info[:2]))
    package_root = os.path.join(root, './lib/python' + python_version)

    # Find and insert setuptools into sys.path
    sys.path.insert(0, package_root)
    real_setuptools = open(os.path.join(package_root,
                                        'setuptools.pth')).read().strip()
    sys.path.insert(0, os.path.join(package_root, real_setuptools))

    # Load all distributions into the working set.
    from pkg_resources import working_set, Environment

    env = Environment(root)
    env.scan()

    distributions, errors = working_set.find_plugins(env)
    for dist in distributions:
        working_set.add(dist)

    return distributions, errors

It's UNIX-centric due to the use of LD_LIBRARY_PATH, but if you're not using shared libraries it's not really necessary anyway.

Use it like so:

from activate_workingenv import activate_workingenv
activate_workingenv('./wenv')
import some_module_from_the_workingenv

Automatically activating `workingenv.py` environments on directory change

workingenv.py is a very useful tool for Python development. Quoting from its home page:

This tool creates an environment that is isolated from the rest of the Python installation, eliminating site-packages and any other source of modules, so that only the modules (and versions) you install into the environment will be available. This allows for isolated and controlled environments, as well as reproduceability.

To create, activate and deactivate an environment:

$ workingenv foo
$ . foo/bin/activate
(foo)$ deactivate
$

This is great, but what's even more so is using On Dir with it.

I need to work on multiple versions of Trac (stable, trunk, branches, etc.) at the same time, I have each version in its own directory beneath ~/projects/trac. Each Trac instance is completely self contained, so installing plugins in one will not affect the others.

So I use the following On Dir config to activate the workingenvs as I cd into each Trac directory.

enter ~/projects/trac/([^/]*)
    declare -F deactivate > /dev/null && deactivate
    activate=../env/$1/bin/activate
    test -r $activate && . $activate

leave ~/projects/trac
    declare -F deactivate > /dev/null && deactivate

Here's an example of me switching between environments. The last environment remains active until I leave the main Trac directory.

[aat@stalactite:~]cd projects/trac/trunk
(trunk)[aat@stalactite:~/projects/trac/trunk]cd ..
(trunk)[aat@stalactite:~/projects/trac]workingenv --site-packages ../env/stable
Updating working environment in /home/aat/projects/trac/env/stable
Installing local setuptools.................done.
(trunk)[aat@stalactite:~/projects/trac]cd stable/
(stable)[aat@stalactite:~/projects/trac/stable]cd ..
(stable)[aat@stalactite:~/projects/trac]cd trunk
(trunk)[aat@stalactite:~/projects/trac/trunk]cd ../..
[aat@stalactite:~/projects]

Of course, this can be extended to any project that needs its own independent Python environment, not just Trac.

Dallas, Texas - !PyCon!

I've arrived in Dallas for PyCon. I'm very keen, yes indeed.

A few Trac hackers will be attending, and we're getting together for a BoF and a sprint. I can't wait to finally put faces to the names of those on trac-dev :)

I'm still not 100% decided on what talks to go to. My experience at LCA made me painfully aware that the subject matter is only a small part of what makes a talk interesting, the delivery is just as important. I'm trying to keep that in mind when selecting talks for PyCon.

I'm hoping to keep a running commentary of my experiences here. Ostensibly for posterity but in reality because I've been roped into giving a run-down on it for SyPy, by Alan ;)

pyndexter enhancements

Now that 0.2 is finally out the door, what next?

Firstly, making query term negation work in the default indexer. Not having this is suboptimal. My initial thought on how to solve this will be to create a set class with support for lazily evaluated complements.

Next up, more extensive unit tests. Specifically, I want to test the UTF support of all the indexer adapters.

pyndexter 0.2 released

After many months of development, pyndexter 0.2 has been released. I'm much happier with the overall design of 0.2 than 0.1, although there are still some major features I'd like to add before I'm completely happy.

Terminal manipulation in Python

A seemingly oft-asked question is how to clear the console/terminal/screen in Python. This is really a subset of a larger question: how does one control and query the terminal from Python?

The basic steps are:

  1. Initialise curses
import curses
curses.setupterm()
  1. Query a capability
# Escape sequence used to clear the terminal
clear = curses.tigetstr('clear')
# Number of colours terminal supports
colours = curses.tigetnum('colors')
# Number of columns
columns = curses.tigetnum('cols')
# Number of lines
lines = curses.tigetnum('lines')
  1. Use a capability
# Clear the screen
import sys
sys.stdout.write(clear)

PS. Ideally, you'd check to see if the capability queries return None.

Merquery: Python full text indexing

There have been a few posts recently related to the creation of a Python abstraction layer for full text indexers, named Merquery.

Merquery is particulary interesting to me, as I started a similar module (named pyndexter, prounounced poindexter) after my experiences writing the Trac trachacks:RepoSearchPlugin. The idea being that I would eventually port the plugin to this API in order to benefit from the efficiency of existing indexers.

The initial design I came up with for pyndexter consisted of the following high-level concepts:

URI
Each document is uniquely identified by a URI. eg. file:///home/athomas/doc/some_doc.txt, mysql://username:password@host/database/table/, etc.
Document
A document is essentially just a block of text with a number of associated attributes, uniquely identified by its URI. Depending on the source of the document, it could contain additional attributes such as database column information, etc.
Document Source
A document source is a class that knows how to retrieve documents for a specific URI scheme, determine whether a document needs to be reindexed, and traverse documents within the scheme. eg. a FileSource object for file://, a MySQLSource for mysql://, and so on.
Indexer
The indexer, of course, performs the indexing of documents. It accepts a document object when indexing, and returns a set of URIs matching a search term when searching. Each indexing engine would have its own subclass of a base indexer class, customising its behaviour appropriately. Some indexers may have limitations on the way they accept data, in which case only a subset of the ideal would be available. eg. An indexer that can only index local files would only accept document objects using the file:// scheme.

The common case? Searching files

In the common case where you just want to index some files, simply instantiate a FileSource and pass it to an indexer:

import os
from pyndexter import *
from pyndexter.hyperestraier import HyperestraierIndexer
from pyndexter.file import FileSource

docs = FileSource(os.getcwd(), include=['*.py'])
indexer = HyperestraierIndexer('indexer.idx', docs)
indexer.update()

search = indexer.search(u'HyperestraierIndexer')
print len(search), 'hits'
for hit in search:
    doc = hit.document
    print hit.uri, doc.size,
    if hit.score:
        print hit.score,
    print  doc.attributes.keys()

indexer.close()

Extensibility

For an application that needs to index custom data, it can either instantiate an indexer and feed it documents generated on the fly, or subclass the base DocumentSource class and implement its own URI scheme. In the former case the application can use its own unique document identifiers, the indexer doesn't care.

Something like the following might be sufficient for a mythical trachacks:RepoSearchPlugin replacement:

from pyndexter import Document
from pyndexter.hyperestraier import HyperestraierIndexer

repo = self.env.get_repository(req.authname)

def walk_repo(node):
    if node.kind == Node.FILE:
        yield node
    elif node.kind == Node.DIRECTORY:
        for subnode in node.get_entries():
            for result in walk_repo(subnode):
                yield result

# Index the repository
hype = HyperestraierIndexer('/some/path/to/index/store')
for node in walk_repo(repo.get_node('/')):
    doc = Document(node.path, node.get_content())
    hype.index(doc)

# Search for some terms
for path in hype.search(u'cheese is good'):
    print path

As documents are being passed to the indexer manually, the caller will have to take care of purging invalid documents from the indexer.

Code

You can browse the source here, download a ZIP from here or check out the source with:

svn co http://swapoff.org/svn/pyndexter/trunk pyndexter-trunk

The example above should work fine.

It has adapters for Hyperestraier (via Hype) and Xapian (via Xapwrap).

To use the Xapain adapter, just s/Hyperestraier/Xapien/g and s/hyperestraier/xapien/g, then make sure you clear out the previous indexer.idx.

For the record, I much prefer Hype for both the intuitive and well designed API, and the indexing speed.