The Sharat's

The ever useful and neat subprocess module

Python’s subprocess module is one of my favourite modules in the standard library. If you have ever done some decent amount of coding in python, you might have encountered it. This module is used for dealing with external commands, intended to be a replacement to the old os.system and the like.

The most trivial use might be to get the output of a small shell command like ls or ps. Not that this is the best way to get a list of files in a directory (think os.listdir), but you get the point.

I am going to put my notes and experiences about this module here. Please note, I wrote this with Python 2.7 in mind. Things are slightly different in other versions (even 2.6). If you find any errors or suggestions, please let me know.

A simple usage

For the sake of providing context, lets run the ls command from subprocess and get its output

import subprocess
ls_output = subprocess.check_output(['ls'])

I’ll cover getting output from a command in detail later. To give more command line arguments,

subprocess.check_output(['ls', '-l'])

The first item in the list is the executable and rest are its command line arguments (argv equivalent). No quirky shell quoting and complex nested quote rules to digest. Just a plain python list.

However, not having shell quoting implies you don’t also have the shell niceties. Like piping for one. The following won’t work the way one would expect it to.

subprocess.check_output(['ls', '|', 'wc', '-l'])

Here, the ls command gets its first command as | and I have no idea what ls would do with it. Perhaps complain that no such file exists. So, instead, we have to use the shell boolean argument. More later down in the article.

Popen class

If there’s just one thing in the subprocess module that you should be concerned with, its the Popen class. The other functions like call, check_output, and check_call use Popen internally. Here’s the signature from the docs.

class subprocess.Popen(args, bufsize=0, executable=None, stdin=None,
    stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False,
    cwd=None, env=None, universal_newlines=False, startupinfo=None,
    creationflags=0)

I suggest you read the docs for this class. As with all python docs, its really good.

Running via the shell

Subprocess can also run command-line instructions via a shell program. This is usually dash/bash on Linux and cmd on windows.

subprocess.call('ls | wc -l', shell=True)

Notice that in this case we pass a string, not a list. This is because we want the shell to interpret the whole of our command. You can even use shell style quoting if you like. It is up to the shell to decide how to best split the command line into executable and command line arguments.

On windows, if you pass a list for args, it will be turned into a string using the same rules as the MS C runtime. See the doc-string for subprocess.list2cmdline for more on this. Whereas on unix-like systems, even if you pass a string, its turned into a list of one item :).

The behaviour of the shell argument can sometimes be confusing so I’ll try to clear it a bit here. Something I wished I had when I first encountered this module.

Firstly, lets consider the case where shell is set to False, the default. In this case, if args is a string, it is assumed to be the name of the executable file. Even if it contains spaces. Consider the following.

subprocess.call('ls -l')

This won’t work because subprocess is looking for an executable file called ls -l, but obviously can’t find it. However, if args is a list, then the first item in this list is considered as the executable and the rest of the items in the list are passed as command line arguments to the program.

subprocess.call(['ls', '-l'])

does what you think it will.

Second case, with shell set to True, the program that actually gets executed is the OS default shell, /bin/sh on Linux and cmd.exe on windows. This can be changed with the executable argument.

When using the shell, args is usually a string, something that will be parsed by the shell program. The args string is passed as a command line argument to the shell (with a -c option on Linux) such that the shell will interpret it as a shell command sequence and process it accordingly. This means you can use all the shell builtins and goodies that your shell offers.

subprocess.call('ls -l', shell=True)

is similar to

$ /bin/sh -c 'ls -l'

In the same vein, if you pass a list as args with shell set to True, all items in the list are passed as command line arguments to the shell.

subprocess.call(['ls', '-l'], shell=True)

is similar to

$ /bin/sh -c ls -l

which is the same as

$ /bin/sh -c ls

since /bin/sh takes just the argument next to -c as the command line to execute.

Getting the return code (aka exit status)

If you want to run an external command and its return code is all you’re concerned with, the call and check_call functions are what you’re looking for. They both return the return code after running the command. The difference is, check_call raises a CalledProcessError if the return code is non-zero.

If you’ve read the docs for these functions, you’ll see that its not recommended to use stdout=PIPE or stderr=PIPE. And if you don’t, the stdout and stderr of the command are just redirected to the parent’s (Python VM in this case) streams.

If that is not what you want, you have to use the Popen class.

proc = Popen('ls')

The moment the Popen class is instantiated, the command starts running. You can wait for it and after its done, access the return code via the returncode attribute.

proc.wait()
print proc.returncode

If you are trying this out in a python REPL, you won’t see a need to call .wait() since you can just wait yourself in the REPL till the command is finished and then access the returncode. Surprise!

>>> proc = Popen('ls')
>>> file1 file2

>>> print proc.returncode
None
>>> # wat?

The command is definitely finished. Why don’t we have a return code?

>>> proc.wait()
0
>>> print proc.returncode
0

The reason for this is the returncode is not automatically set when a process ends. You have to call .wait or .poll to realize if the program is done and set the returncode attribute.

IO Streams

The simplest way to get the output of a command, as seen previously, is to use the check_output function.

output = subprocess.check_output('ls')

Notice the check_ prefix in the function name? Ring any bell? That’s right, this function will raise a CalledProcessError if the return code is non-zero.

This may not always be the best solution to get the output from a command. If you do get a CalledProcessError from this function call, unless you have the contents of stderr you probably have little idea what went wrong. You’ll want to know what’s written to the command’s stderr.

Reading error stream

There are two ways to get the error output. First is redirecting stderr to stdout and only being concerned with stdout. This can be done by setting the stderr argument to subprocess.STDOUT.

Second is to create a Popen object with stderr set to subprocess.PIPE (optionally along with stdout argument) and read from its stderr attribute which is a readable file-like object. There is also a convenience method on Popen class, called .communicate, which optionally takes a string to be sent to the process’s stdin and returns a tuple of (stdout_content, stderr_content).

Watching both stdout and stderr

However, all of these assume that the command runs for some time, prints out a couple of lines of output and exits, so you can get the output(s) in strings. This is sometimes not the case. If you want to run a network intensive command like an svn checkout, which prints each file as and when downloaded, you need something better.

The initial solution one can think of is this.

proc = Popen('svn co svn+ssh://myrepo', stdout=PIPE)
for line in proc.stdout:
    print line

This works, for the most part. But, again, if there is an error, you’ll want to read stderr too. It would be nice to read stdout and stderr simultaneously. Just like a shell seems to be doing. Alas, this remains a not so straightforward problem as of today, at least on non-Linux systems.

On Linux (and where its supported), you can use the select module to keep an eye on multiple file-like stream objects. But this isn’t available on windows. A more platform independent solution that I found works well, is using threads and a Queue.

from subprocess import Popen, PIPE
from threading import Thread
from Queue import Queue, Empty

io_q = Queue()

def stream_watcher(identifier, stream):

    for line in stream:
        io_q.put((identifier, line))

    if not stream.closed:
        stream.close()

proc = Popen('svn co svn+ssh://myrepo', stdout=PIPE, stderr=PIPE)

Thread(target=stream_watcher, name='stdout-watcher',
        args=('STDOUT', proc.stdout)).start()
Thread(target=stream_watcher, name='stderr-watcher',
        args=('STDERR', proc.stderr)).start()

def printer():
    while True:
        try:
            # Block for 1 second.
            item = io_q.get(True, 1)
        except Empty:
            # No output in either streams for a second. Are we done?
            if proc.poll() is not None:
                break
        else:
            identifier, line = item
            print identifier + ':', line

Thread(target=printer, name='printer').start()

Fair bit of code. This is a typical producer-consumer thing. Two threads producing lines of output (one each from stdout and stderr) and pushing them into a queue. One thread watching the queue and printing the lines until the process itself finishes.

Passing an environment

The env argument to Popen (and others) lets you customize the environment of the command being run. If it is not set, or is set to None, the current process’s environment is used, just as documented.

You might not agree with me, but I feel there are some subtleties with this argument that should have been mentioned in the documentation.

Merge with current environment

One is that if you provide a mapping to env, whatever is in this mapping is all that’s available to the command being run. For example, if you don’t give a TOP_ARG in the env mapping, the command won’t see a TOP_ARG in its environment. So, I frequently find myself doing this

p = Popen('command', env=dict(os.environ, my_env_prop='value'))

This makes sense once you realize it, but I wish it were at least hinted at in the documentation.

Unicode

Another one, is to do with Unicode (Surprise surprise!). And windows. If you use unicodes in the env mapping, you get an error saying you can only use strings in the environment mapping. The worst part about this error is that it only seems to happen on windows and not on Linux. If its an error to use unicodes in this place, I wish it break on both platforms.

This issue is very painful if you’re like me and use unicode all the time.

from __future__ import unicode_literals

That line is present in all my python source files. The error message doesn’t even bother to mention that you have unicodes in your env so it’s very hard to understand what’s going wrong.

Execute in a different working directory

This is handled by the cwd argument. You set the location of the directory which you want as the working directory of the program you are launching.

The docs do mention that the working directory is changed before the command even starts running. But that you can’t specify program’s path relative to the cwd. In reality, I found that you can do this.

Either I’m missing something with this or the docs really are inaccurate. Anyway, this works

subprocess.call('./ls', cwd='/bin')

Prints out all the files in /bin. Of course, the following doesn’t work when the working directory is not /bin.

subprocess.call('./ls')

So, if you are giving something explicitly to cwd and are using a relative path for the executable, this is something to keep in mind.

Killing and dying

A simple

proc.terminate()

Or for some dramatic umphh!

proc.kill()

Will do the trick to end the process. As noted in the documentation, the former sends a SIGTERM and later sends a SIGKILL on unix, but both do some native windows-y thing on windows.

Auto-kill on death

The processes you start in your python program, stay running even after your program exits. This is usually what you want, but when you want all your sub processes killed automatically on exit with Ctrl+C or the like, you have to use the atexit module.

procs = []

@atexit.register
def kill_subprocesses():
    for proc in procs:
        proc.kill()

And add all the Popen objects created to the procs list. This is the only solution I found that works best.

Launch commands in a terminal emulator

On one occasion, I had to write a script that would launch multiple svn checkouts and then run many ant builds (~20-35) on the checked out projects. In my opinion, the best and easiest way to do this is to fire up multiple terminal emulator windows each running an individual checkout/ant-build. This allows us to monitor each process and even cancel any of them by simply closing the corresponding terminal emulator window.

Linux

This is pretty trivial actually. On Linux, you can use xterm for this.

Popen(['xterm', '-e', 'sleep 3s'])

Windows

On windows, its not as straight forward. The first solution for this would be

Popen(['cmd', '/K', 'command'])

/K option tells cmd to run the command and keep the command window from closing. You may use /C instead to close the command window after the command finishes.

As simple as it looks, it has some weird behavior. I don’t completely understand it, but I’ll try to explain what I have. When you try to run a python script with the above Popen call, in a command window like this

python main.py

you don’t see a new command window pop up. Instead, the sub command runs in the same command window. I have no idea what happens when you run multiple sub commands this way. (I have only limited access to windows).

If instead you run it in something like an IDE or IDLE (F5), you have a new command window open up. I believe one each for each command you run this way. Just the way you expect.

But I gave up on cmd.exe for this purpose and learnt to use the mintty utility that comes with cygwin (I think 1.7+). mintty is awesome. Really. Its been a while since I felt that way about a command line utility on windows.

Popen(['mintty', '--hold', 'error', '--exec', 'command'])

This. A new mintty console window opens up running the command and it closes automatically, if the command exits with zero status (that’s what --hold error does). Otherwise, it stays on. Very useful.

Conclusion

The subprocess module is a very useful thing. Spend some time understanding it better. This is my attempt at helping people with it, and turned out to be way longer than I’d expected. If there are any inaccuracies in this, or if you have anything to add, please leave a comment.

Discuss on: Hacker News or Reddit.