Moving house
I am moving my blog to Posterous and taking the opportunity to separate concerns.
Technical articles will be posted to http://swapoff.posterous.com while personal stuff (probably mostly rants) will go to http://alecthomas.posterous.com.
Emulating !ActionScript's "with" statement in Python
If you haven't had much to do with ActionScript or Visual Basic, right now you might be thinking, as I did, "huh?" Basically the with statement in both of these languages allows one to access the attributes of an object as if they were in the local scope.
I know, I know. Why would you want to do this? I have no idea, but I thought it was cool that it was even possible in Python. However, there are some fairly major limitations:
- It only works in a global scope (I neglected to mention this originally, apologies - see #74 for details). One can s/f_locals/f_globals/g but this introduces its own problems.
- It only works for objects with a mutable __dict__.
I also thought the approach of injecting symbols into the context could be useful for providing DSL-like features. eg.
with query('a', 'b', 'c') as result: SELECT(a.foo, b.bar, c.baz) WHERE(AND(a.foo == 10, b.bar == 20)) ORDER_BY(a.foo) GROUP_BY(b.bar) for row in result: print row.foo, row.bar, row.baz
etc.
But enough speculation, here is an actual example. It requires at least Python 2.5.
And finally, here's the actual code.
from __future__ import with_statement import inspect class scope(object): """A context that maps an objects attributes into the current lexical scope, only for the duration of the context. NB. Properties will NOT work. >>> import math >>> with scope(math): ... print sqrt(9) 3.0 >>> class Vector: ... def __init__(self, x, y): ... self.x = x ... self.y = y ... def length(self): ... return math.sqrt(self.x ** 2 + self.y ** 2) >>> v = Vector(3, 3) >>> with scope(v): ... print x, '%0.2f' % length() ... x = 4 ... print x, '%0.2f' % length() 3 4.24 4 5.00 >>> print v.x, v.y, v.length() 4 3 5.0 >>> print x Traceback (most recent call last): ... NameError: name 'x' is not defined """ def __init__(self, obj): self.members = dict(inspect.getmembers(obj)) self.obj = obj def __enter__(self): parent = inspect.currentframe().f_back # Preserve parent locals() and the objects dictionaries self.old_locals = dict(parent.f_locals) self.old_dict = self.obj.__dict__ # Update locals with object members parent.f_locals.update(self.members) # If possible, replace the object's __dict__ with our updated locals. # Functions on the object that operate on the attributes will then # continue to work as expected. The downside is that all other objects # in the parents scope will be included as well. try: self.obj.__dict__ = parent.f_locals self.fake_dict = True except (TypeError, AttributeError): self.fake_dict = False def __exit__(self, type, value, traceback): parent = inspect.currentframe().f_back if self.fake_dict: object_dict = self.old_dict else: object_dict = self.obj.__dict__ # Replicate member state from locals to object's __dict__. for key in self.members: if key in parent.f_locals: object_dict[key] = parent.f_locals[key] if key not in self.old_locals: del parent.f_locals[key] else: del object_dict[key] if self.fake_dict: self.obj.__dict__ = object_dict del self.old_locals del self.old_dict return False if __name__ == '__main__': import doctest doctest.testmod()
Walking a py3k AST
The compiler module has been removed from py3k. Fortunately there's a replacement in the _ast module introduced in Python 2.5. Unfortunately, while the compiler module has a useful __repr__esentation for its AST objects...
This is not the case for the _ast module, so here's a function that will dump _ast.AST objects as a dict:
Type masquerading using dynamic base classes
Python allows us to construct new classes on-the-fly with type(). By (ab)using this feature we can construct a class that preserves its own interface while assuming all of the behaviour of an existing object. Useful for wrapping compound types such as dictionaries, lists, sets, etc. without having to proxy __getitem__, __iter__ and so on, or write a custom __getattr__.
One example of when this could be useful is a JSON-specific HTTP client object that assumes the type of the JSON data, while maintaining response-specific attributes such as headers and status:
import simplejson class Response(object): """A response object that masquerades as the decoded content type.""" def __new__(cls, content=None, headers=None, status=None): if content is not None: content = simplejson.loads(content) assumed_type = type(content) bases = (Response, assumed_type) name = assumed_type.__name__.title() + 'Response' cls = type(name, bases, {}) self = assumed_type.__new__(cls, content) self.assumed_type = assumed_type else: self = object.__new__(cls) return self def __init__(self, content=None, headers=None, status=None): if content is not None: super(Response, self).__init__(content) self.headers = headers self.status = status json_data = [ '{"foo": 1, "bar": 2}', '123', '123.5', '["foo", "bar"]', ] for data in json_data: response = Response(data, headers=[('Content-Type', 'application/json')], status=200) decoded = simplejson.loads(data) print response print ' Same type?', isinstance(response, type(decoded)) try: print ' Iteration:', print [i for i in response] except TypeError: print '(type does not support iteration)' print ' Headers:', response.headers print ' Status:', response.status print
Outputs this:
{u'foo': 1, u'bar': 2}
Same type? True
Iteration: [u'foo', u'bar']
Headers: [('Content-Type', 'application/json')]
Status: 200
123
Same type? True
Iteration: (type does not support iteration)
Headers: [('Content-Type', 'application/json')]
Status: 200
123.5
Same type? True
Iteration: (type does not support iteration)
Headers: [('Content-Type', 'application/json')]
Status: 200
[u'foo', u'bar']
Same type? True
Iteration: [u'foo', u'bar']
Headers: [('Content-Type', 'application/json')]
Status: 200
CLY 0.9 released
CLY is a project I've been working on for a while now, and I've finally gotten around to releasing it.
It's basically a CLI parser and grammar constructor that lets you easily add command-line interfaces to your applications:
echo.py:
from cly import * def echo(text): print text grammar = Grammar( echo=Node(help='Echo text')( text=Variable(help='Text to echo', pattern=r'.+')( Action(callback=echo), ), ), ) interact(grammar)
CLY automatically generates contextual help and provides tab completion (fully customisable). If you ran the above program it would work like this:
cly> ?
echo Echo text
cly> echo
^ more input required (expected <text>)
cly> echo ?
<text> Text to echo
cly> echo some text
some text
Grammars can also be defined in XML. Here's the above example rewritten to use an XML grammar:
echo.xml:
<?xml version="1.0"?> <grammar xmlns="http://swapoff.org/cly/xml"> <node name="echo" help="Echo text"> <variable name="text" pattern=".+" help="Text to echo"> <action callback="echo"/> </variable> </node> </grammar>
echo.py:
from cly import * def echo(text): print text grammar = Grammar.from_xml(open('echo.xml').read(), echo=echo) interact(grammar)
More examples are available in the tutorial, developers guide and the API documentation.
Adding support for a workingenv sandbox to setuptools/distutils
It seems that all I blog about is workingenv.
This time it's a snippet of code that adds a "sandbox" command to distutils, which automatically creates a workingenv from the setuptools extras_require, install_requires and dependency_links options listed in your setup() call. It only supports *nix systems for now, but could easily be extended to support Windows.
Here's the code:
import os from setuptools import setup, find_packages from distutils.cmd import Command class sandbox(Command): description = 'Create a development sandbox using workingenv' user_options = [ ('path=', None, 'workingenv path'), ('extras', None, 'also include "extras" requirements'), ] def initialize_options(self): self.path = 'wenv' self.extras = False def finalize_options(self): pass def run(self): requires = open('requirements.txt', 'w') try: requirements = self.distribution.dependency_links + \ self.distribution.install_requires if self.extras: extras = self.distribution.extras_require or {} requirements += extras.values() requires.write('\n'.join(requirements)) finally: requires.close() cwd = os.getcwd() import workingenv workingenv.main(['--always-unzip', '--requirements=requirements.txt', '--site-packages', '--verbose', self.path]) os.chdir(cwd) os.symlink(self.path + '/bin/activate', 'sandbox') print print 'XXX: Use ". sandbox" to activate the development sandbox' setup( name='MyCoolPackage', version='0.0.0.1', packages=find_packages(), # Add the sandbox command cmdclass={'sandbox': sandbox}, # Search some extra locations for dependencies dependency_links=[ 'http://svn.edgewall.org/repos/genshi/trunk#egg=Genshi-dev', 'http://trac.pocoo.org/repos/werkzeug/trunk#egg=Werkzeug-dev', 'http://svn.sqlalchemy.org/sqlalchemy/trunk#egg=SQLAlchemy-dev', ], install_requires=[ 'setuptools >= 0.6b1', 'Genshi >= 0.5.dev-r698,==dev', 'Werkzeug >= 0.1.dev-r3831,==dev', 'SQLAlchemy >= 0.4.0.dev-r3203,==dev', 'AuthKit >= 0.3.0pre5', ], )
And here's an example of how to use it:
$ python setup.py sandbox --help Common commands: (see '--help-commands' for more) ... Options for 'sandbox' command: --path workingenv path --extras also include "extras" requirements ... $ python setup.py sandbox --path=mysandbox --extras running sandbox Reading requirement requirements.txt Making working environment in /home/athomas/p/test/mysandbox Creating lib/python2.5 ... ...Installing http://svn.edgewall.org/repos/genshi/trunk#egg=Genshi-dev, http://trac.pocoo.org/repos/werkzeug/trunk#egg=Werkzeug-dev, http://svn.sqlalchemy.org/sqlalchemy/trunk#egg=SQLAlchemy-dev, setuptools >= 0.6b1, Genshi >= 0.5.dev-r698,==dev, Werkzeug >= 0.1.dev-r3831,==dev, SQLAlchemy >= 0.4.0.dev-r3203,==dev, AuthKit >= 0.3.0pre5 ...done. XXX: Use ". sandbox" to activate the development sandbox
April !SyPy Presentation on !PyCon
This months SyPy meeting attracted a larger number of people than usual, around 35 or so (compared to the usual 5 or 6).
I gave a talk about my trip to PyCon, which was received fairly well I think (I judge it a success by not being booed off the stage). S5 version is here.
Andrew Bennets gave a pretty interesting talk on Bazaar, although a lot of it was an introduction to distributed VCSes in general. One interesting aspect of Bazaar in particular, is the plugin system.
Activating a `workingenv` from Python
It can, under some circumstances, be useful to be able to activate a workingenv from Python. Here's a quick function to achieve that:
import sys import os def activate_workingenv(root): """Make modules in a self-contained workingenv available.""" # Add ./bin directory to path. bin_dir = os.path.join(root, './bin') try: os.environ['PATH'] = os.path.pathsep.join([bin_dir, os.environ['PATH']]) except KeyError: os.environ['PATH'] = bin_dir # Add ./lib to linker path lib_dir = os.path.join(root, './lib') try: os.environ['LD_LIBRARY_PATH'] = \ os.path.pathsep.join([lib_dir, os.environ['LD_LIBRARY_PATH']]) except KeyError: os.environ['LD_LIBRARY_PATH'] = lib_dir # Find the workingenv Python package root python_version = '.'.join(map(str, sys.version_info[:2])) package_root = os.path.join(root, './lib/python' + python_version) # Find and insert setuptools into sys.path sys.path.insert(0, package_root) real_setuptools = open(os.path.join(package_root, 'setuptools.pth')).read().strip() sys.path.insert(0, os.path.join(package_root, real_setuptools)) # Load all distributions into the working set. from pkg_resources import working_set, Environment env = Environment(root) env.scan() distributions, errors = working_set.find_plugins(env) for dist in distributions: working_set.add(dist) return distributions, errors
It's UNIX-centric due to the use of LD_LIBRARY_PATH, but if you're not using shared libraries it's not really necessary anyway.
Use it like so:
from activate_workingenv import activate_workingenv activate_workingenv('./wenv') import some_module_from_the_workingenv
Automatically activating `workingenv.py` environments on directory change
workingenv.py is a very useful tool for Python development. Quoting from its home page:
This tool creates an environment that is isolated from the rest of the Python installation, eliminating site-packages and any other source of modules, so that only the modules (and versions) you install into the environment will be available. This allows for isolated and controlled environments, as well as reproduceability.
To create, activate and deactivate an environment:
$ workingenv foo $ . foo/bin/activate (foo)$ deactivate $
This is great, but what's even more so is using On Dir with it.
I need to work on multiple versions of Trac (stable, trunk, branches, etc.) at the same time, I have each version in its own directory beneath ~/projects/trac. Each Trac instance is completely self contained, so installing plugins in one will not affect the others.
So I use the following On Dir config to activate the workingenvs as I cd into each Trac directory.
enter ~/projects/trac/([^/]*)
declare -F deactivate > /dev/null && deactivate
activate=../env/$1/bin/activate
test -r $activate && . $activate
leave ~/projects/trac
declare -F deactivate > /dev/null && deactivate
Here's an example of me switching between environments. The last environment remains active until I leave the main Trac directory.
[aat@stalactite:~]cd projects/trac/trunk (trunk)[aat@stalactite:~/projects/trac/trunk]cd .. (trunk)[aat@stalactite:~/projects/trac]workingenv --site-packages ../env/stable Updating working environment in /home/aat/projects/trac/env/stable Installing local setuptools.................done. (trunk)[aat@stalactite:~/projects/trac]cd stable/ (stable)[aat@stalactite:~/projects/trac/stable]cd .. (stable)[aat@stalactite:~/projects/trac]cd trunk (trunk)[aat@stalactite:~/projects/trac/trunk]cd ../.. [aat@stalactite:~/projects]
Of course, this can be extended to any project that needs its own independent Python environment, not just Trac.
Dallas, Texas - !PyCon!
I've arrived in Dallas for PyCon. I'm very keen, yes indeed.
A few Trac hackers will be attending, and we're getting together for a BoF and a sprint. I can't wait to finally put faces to the names of those on trac-dev :)
I'm still not 100% decided on what talks to go to. My experience at LCA made me painfully aware that the subject matter is only a small part of what makes a talk interesting, the delivery is just as important. I'm trying to keep that in mind when selecting talks for PyCon.
I'm hoping to keep a running commentary of my experiences here. Ostensibly for posterity but in reality because I've been roped into giving a run-down on it for SyPy, by Alan ;)
pyndexter enhancements
Now that 0.2 is finally out the door, what next?
Firstly, making query term negation work in the default indexer. Not having this is suboptimal. My initial thought on how to solve this will be to create a set class with support for lazily evaluated complements.
Next up, more extensive unit tests. Specifically, I want to test the UTF support of all the indexer adapters.
pyndexter 0.2 released
After many months of development, pyndexter 0.2 has been released. I'm much happier with the overall design of 0.2 than 0.1, although there are still some major features I'd like to add before I'm completely happy.
Terminal manipulation in Python
A seemingly oft-asked question is how to clear the console/terminal/screen in Python. This is really a subset of a larger question: how does one control and query the terminal from Python?
The basic steps are:
- Initialise curses
import curses curses.setupterm()
- Query a capability
# Escape sequence used to clear the terminal clear = curses.tigetstr('clear') # Number of colours terminal supports colours = curses.tigetnum('colors') # Number of columns columns = curses.tigetnum('cols') # Number of lines lines = curses.tigetnum('lines')
- Use a capability
# Clear the screen import sys sys.stdout.write(clear)
PS. Ideally, you'd check to see if the capability queries return None.
Why OSX is not for me
So I was provided with a 15" MacBook Pro when I started my new job and have been using the machine for four weeks now.
Here is why I've decided to trade the MacBook in for a Thinkpad running Linux and Openbox, along with some of the things I actually did like (though not enough ;)). Given the amount of time invested I don't think I'm simply making an arbitrary decision.
The Bad
No per-desktop tab-cycle
I started off by installing and configuring Virtue Desktops, figuring it would be less jarring for me given my preceding environment. The problem for me is that all applications are still in the tab-cycle, no matter what desktop you're on.
Under Openbox, if you haven't guessed by now, each desktop has its own distinct tab-cycle, making each desktop much more its own environment. For example, I typically use desktop 3 for E-Mail. I open up a persistent terminal to run Mutt, and applications for viewing attachments (Acrobat, Open Office, etc.) are all contained in that desktop. I will also open up temporary terminals for dealing with mail related stuff, such as removing old saved attachments etc.
This means that Virtue desktops are really not that different from normal OSX tab-cycling as far as I'm concerned. Consequently, I ditched Virtue quite early and started just using iKey and normal alt-tab.
Terminals are not "Single-use"
My model of work, which I expect is not unique, is to have a set of terminal windows open for long-term tasks like development. For shorter tasks though, I'll open up a new terminal, do my thing, then close it. For example if I am reading a web page which mentions a feature of a utility which I've not used, I'll often open up a shell and try it out, then close it when I'm done.
OSX makes this kind of model prohibitively difficult by aggregating all terminal windows under one application. If you alt-tab to Terminal.app it will bring all the terminal windows to the fore, completely obscuring whatever you were observing.
The only way I found that came even close to working around this problem was by making a duplicate of the Terminal application and launching it. This will give you two completely distinct sets of terminal windows, though obviously this doesn't scale well.
Keyboard Shortcuts
The keyboard shortcuts under OSX are completely different. Ctrl-right does not go to the next word, page up seems to move the cursor but not the view, etc. Minor annoyances really, as you can just learn the new OSX bindings or rebind them if you're keen. But yet another hurdle to overcome.
Shareware
This is more of a cultural difference, but under Linux you can without fail find an open source application to do exactly what you want. As a developer, this is a godsend. If there are annoying bugs in the application, you can just fix it yourself. Conversely under OSX, Free Software is not all that common - vastly outnumbered by binary-only Shareware.
The range of software also appears to be a lot more limited. That being said, a lot of the software is of very high quality: Omnigraffle, Virtue Desktops, iKeys, Adium, etc.
Hardware Quality
First revision hardware is typically riddled with problems, so the issues people have had with the MacBook Pro hardware is really to be expected. I myself experienced the extremely dodgy lid button.
The Good
Hardware Design Touches
The MacBook Pro hardware, though not without its issues, has some very nice features: the iSight camera, magnetic cable, nice screen, keyboard backlight, slot-load DVD drive.
True Transparency
I like being able to see what is behind a transparent terminal. It is mostly just cool, but occasionally actually useful :)
Dashboard
Dashboard is cool. I particularly like the dictionary application, although it being US-specific is a bit frustrating.
The Nondescript
Expose
The only feature of Expose I actually use is the reveal desktop feature. The application switching feature, while pretty, is a lot slower than alt-tabbing or iKey application shortcuts.
Merquery: Python full text indexing
There have been a few posts recently related to the creation of a Python abstraction layer for full text indexers, named Merquery.
Merquery is particulary interesting to me, as I started a similar module (named pyndexter, prounounced poindexter) after my experiences writing the Trac trachacks:RepoSearchPlugin. The idea being that I would eventually port the plugin to this API in order to benefit from the efficiency of existing indexers.
The initial design I came up with for pyndexter consisted of the following high-level concepts:
- URI
- Each document is uniquely identified by a URI. eg. file:///home/athomas/doc/some_doc.txt, mysql://username:password@host/database/table/, etc.
- Document
- A document is essentially just a block of text with a number of associated attributes, uniquely identified by its URI. Depending on the source of the document, it could contain additional attributes such as database column information, etc.
- Document Source
- A document source is a class that knows how to retrieve documents for a specific URI scheme, determine whether a document needs to be reindexed, and traverse documents within the scheme. eg. a FileSource object for file://, a MySQLSource for mysql://, and so on.
- Indexer
- The indexer, of course, performs the indexing of documents. It accepts a document object when indexing, and returns a set of URIs matching a search term when searching. Each indexing engine would have its own subclass of a base indexer class, customising its behaviour appropriately. Some indexers may have limitations on the way they accept data, in which case only a subset of the ideal would be available. eg. An indexer that can only index local files would only accept document objects using the file:// scheme.
The common case? Searching files
In the common case where you just want to index some files, simply instantiate a FileSource and pass it to an indexer:
import os from pyndexter import * from pyndexter.hyperestraier import HyperestraierIndexer from pyndexter.file import FileSource docs = FileSource(os.getcwd(), include=['*.py']) indexer = HyperestraierIndexer('indexer.idx', docs) indexer.update() search = indexer.search(u'HyperestraierIndexer') print len(search), 'hits' for hit in search: doc = hit.document print hit.uri, doc.size, if hit.score: print hit.score, print doc.attributes.keys() indexer.close()
Extensibility
For an application that needs to index custom data, it can either instantiate an indexer and feed it documents generated on the fly, or subclass the base DocumentSource class and implement its own URI scheme. In the former case the application can use its own unique document identifiers, the indexer doesn't care.
Something like the following might be sufficient for a mythical trachacks:RepoSearchPlugin replacement:
from pyndexter import Document from pyndexter.hyperestraier import HyperestraierIndexer repo = self.env.get_repository(req.authname) def walk_repo(node): if node.kind == Node.FILE: yield node elif node.kind == Node.DIRECTORY: for subnode in node.get_entries(): for result in walk_repo(subnode): yield result # Index the repository hype = HyperestraierIndexer('/some/path/to/index/store') for node in walk_repo(repo.get_node('/')): doc = Document(node.path, node.get_content()) hype.index(doc) # Search for some terms for path in hype.search(u'cheese is good'): print path
As documents are being passed to the indexer manually, the caller will have to take care of purging invalid documents from the indexer.
Code
You can browse the source here, download a ZIP from here or check out the source with:
svn co http://swapoff.org/svn/pyndexter/trunk pyndexter-trunk
The example above should work fine.
It has adapters for Hyperestraier (via Hype) and Xapian (via Xapwrap).
To use the Xapain adapter, just s/Hyperestraier/Xapien/g and s/hyperestraier/xapien/g, then make sure you clear out the previous indexer.idx.
For the record, I much prefer Hype for both the intuitive and well designed API, and the indexing speed.

rss