Creating a Virtual Filesystem with Python (and why you need one)

If you are writing an application of any size, it will most likely require a number of files to run files which could be stored in a variety of possible locations. Furthermore, you will probably want to be able to change the location of those files when debugging and […]

If you are writing an application of any size, it will most likely require a number of files to run – files which could be stored in a variety of possible locations. Furthermore, you will probably want to be able to change the location of those files when debugging and testing. You may even want to store those files somewhere other than the user's hard drive.

Any engineer worth his salt will recognise that the file locations should be stored in some kind of configuration file and the code to read the files in question should be factored out so that it isn't just scattered at points where data is read or written. In this post I'll present a way of doing just that by creating a virtual filesystem with PyFilesystem.

You'll need the most recent version of PyFilesystem from SVN to run this code.

We're going to create a virtual filesystem for a fictitious application that requires per-application and per-user resources, as well as a location for cache and log files. I'll also demonstrate how to mount files located on a web server. Here's the code:

from fs.opener import fsopendir
app_fs = fsopendir('mount://fs.ini', create_dir=True)

That's, all there is to it; two lines of code (one if you don't count the import). Obviously there is quite a bit going on under the hood here, which I'll explain below, but lets see what this code gives you…

The app_fs object is an interface to a single filesystem that contains all the file locations our application will use. For example, the path /user/app.ini references a per-user file, whereas /resources/logo.png references a per application file. The actual physical location of the data is irrelevant because as far as your application is concerned the paths never change. This abstraction is useful because the real path for such files varies according to the platform the code is running on; Windows, Mac and Linux all have different conventions, and if you put your files in the wrong place, your app will likely break on one platform or another.

Here's how a per-user configuration file might be opened:

from ConfigParser import ConfigParser
# The 'safeopen' method works like 'open', but will return an
# empty file-like object if the path does not exist
with app_fs.safeopen('/user/app.ini') as ini_file:
    cfg = ConfigParser()
    cfg.readfp(ini_file)
    # ... do something with cfg

The files in our virtual filesystem don't even have to reside on the local filesystem. For instance, /live/ may actually reference a location on the web, where the version of the current release and a short ‘message of the day’ is stored.

Here's how the version number and MOTD might be read:

def get_application_version():
    """Get the version number of the most up to date version of the application,
    as a tuple of three integers"""
    with app_fs.safeopen('live/version.txt') as version_file:
        version_text = version_file.read().rstrip()
    if not version_text:
        # Empty file or unable to read
        return None
    return tuple(int(v) for v in version_text.split('.', 3))

def get_motd():
    """Get a welcome message"""
    with app_fs.safeopen("live/motd.txt") as motd_file:
        return motd_file.read().rstrip()

You'll notice that even though the actual data is retrieved over HTTP (the files are located here and here), the code would be no different if the files were stored locally.

So how is all this behaviour created from a single line of code? The line fsopendir("mount://fs.ini", create_dir=True) opens a MountFS from the information contained within an INI file (create_dir=True will create specified directories if they don't exist). Here's an example of an INI file that could be used during development:

[fs]
user=./user
resources=./resources
logs=./logs
cache=./user/cache
live=./live

The INI file is used to construct a MountFS, where the keys in the [fs] section are the top level directory names and the values are the real locations of the files. In above example, /user/ maps on to a directory called user relative to the current directory – but it could be changed to an absolute path or to a location on a server (e.g. FTP, SFTP, HTTP, DAV), or even to a directory within a zip file.

You can change the section to use in a mount opener by specifying it after a # symbol, i.e. mount://fs.ini#mysection

There are a few changes to this INI file we will need to make when our application is ready for release. User data, site data, logs and cache all have canonical locations that are derived from the name of the application (and the author on Windows). PyFilesystem contains handy openers for these special locations. For example, appuser://examplesoft:myapp detects the appropriate per-user data location for an application called “myapp” developed by “examplesoft”. Ditto for the other per-application directories. e.g.:

[fs]
user=appuser://examplesoft:myapp
resources=appsite://examplesoft:myapp
logs=applog://examplesoft:myapp
cache=appcache://examplesoft:myapp

The /live/ path is different in that it needs to point to a web server:

live=http://www.willmcgugan.com/static/cfg/

Of course, you don't need to use the canonical locations. For instance, let's say you want to store all your static resources in a zip file. No problem:

resources=zip://./resources.zip

Or you want to keep your user data on a SFTP (Secure FTP) server:

user=sftp://username:password@example.org/home/will/

Perhaps you don't want to preserve the cache across sessions, for security reasons. The temp opener creates files in a temp directory and deletes them on close:

cache=temp://

Although, if you are really paranoid you can store the cache files in memory without ever writing them to disk:

cache=mem://

Setting /user/ to mem:// is a useful way of simulating a fresh install when debugging.

I hope that covers why you might need – or at least want – a virtual file system in your application. I've glossed over some the details and other features of PyFilesystem. If you would like more information, see my previous posts, check out the documentation or join the PyFilesystem discussion group.

kevin —

Great stuff. Is it possible to take this concept and make it work on Google AppEngine?

Reply to kevin

Will McGugan —

Kevin, I haven't tried it on GAE, but assuming it runs Python2.5 it should work!

Reply to Will McGugan

Assuming GAE runs Py2.5 that is…

Kevin —

I think right now GAE can only run 2.5 and not 2.6 or later. It looks like /live/ must be either a zip file or live some where on the web since GAE does not have an accessible file system.

Reply to Kevin

Svend Tofte —

FUSE is also well worth exploring if virtual file systems is your thing. The fusepy bindings are very easy to work with as well. From looking at the documentation it seems like pyfilesystem also can create a whole new file system (such as expose printers directories, where putting any file into it causes it to be printed, or what have you.

It would be nice to see a null/loopback file system, such as this one for fusepy:

http://code.google.com/p/fusepy/source/browse/trunk/loopback.py

Maybe it's in the docs somewhere, but I did not see it at least.

Reply to Svend Tofte

Svend, there is FUSE support in PyFilesystem, and Dokan (Windows equivelent), http://packages.python.org/fs/expose.html

Not sure what a loopback file system would do in this context, the MounFS has that capability. A null filesystem might be useful though…

lamby —

Seen django-fuse?

Reply to lamby

julou —

I read this post with great interest…
Indeed, I am looking for a way to create a file system that would allow to browse the attachments of zotero collections in the finder (or explorer).
Ideally, this would consist in creating a read-only virtual system in which I would add links to the relevant files.

I am completely new to VFS programing (although I am a convinced user)… So I have several questions:
- what kind of file system would be more appropriate? FUSE?…
- is it possible to create a read-only VFS where only links to files are specified, or do I have to store files in a temp location? If yes, a minimal example would be of great help!
- how to mount a VFS in the finder/explorer?

Thank you in advance for your help, Thomas.

Reply to julou

Julou, That should be possible. See the Guide for implementers. Once you have created a filesystem you can mount it with FUSE. See the docs and join the mailing list if you need more information.

Thanks for the quick reply.

This helped me to understand that FUSE is not a file system ;) and how I can use it to expose a FS in the finder/explorer…
Nonetheless, the second question stands open in my mind:

is it possible to create a read-only VFS where only links to files are specified, or do I have to store files in a temp location? If yes, a minimal example would be of great help!

montibo —

Julou,
would you eventually share your code? I am a beginner in VFS and try to make the same kind of thing with my blog.

The VFS has a really huge potential.
Cheers

Reply to montibo

Jacob J. Walker —

I too am interested in making a VFS that I could use from Windows to access Zotero. It seems on the surface that this shouldn't be a difficult project, because Zotero uses SQLLite. Although of course I know that programming always gets more complex and takes longer than it seems it would. Has anyone started on this project already, and wants to collaborate? Is Python really the best language to do this in? (I'm not against Python, I just honestly don't know the advantages and disadvantages of using it as the language for this type of project.)

Reply to Jacob J. Walker

Rohit —

Sir, can you give me full code….?

Reply to Rohit

Steve (Gadget) Barnes —

Nice write-up! For completeness it might be worth mentioning the idea of using environmental variables to switch the location(s) used by your file system between dev/debug & live/production.

Reply to Steve (Gadget) Barnes

Creating a Virtual Filesystem with Python (and why you need one)

Creating a Virtual Filesystem with Python (and why you need one)

PyFilesystem is greater than or equal to Pathlib

Filesystem Magic with Python

PyFilesystem 2.1.0 adds concurrent uploads / downloads and support for globbing

Amazon S3 Filesystem for Python

PyFilesystem 2.0.4 Released

PyFilesystem 2.0.3 Released