Tuesday 28 May 2013

Better than system("touch")

I've seen a lot of people use
system("touch");
to make sure a file exists, and/or has a recent access/modification time. For example, see here, here and here.
I'm here to tell you: system() sucks! Why? Take a look at man system:
       int system(const char *command);

DESCRIPTION
       system() executes a command specified in command by calling /bin/sh
       -c command, and returns after the command has been completed.
So it's not just running the touch program. It's starting a shell, then running whatever you passed in that shell. This is:
  1. Slow. First you start a shell, then you start another program from that shell? Seems like a lot of hassle.
  2. A security risk. Say you take the filename from the user, then run something like:
    std::stringstream ss;
    ss << "touch " << filename;
    system(ss.str().c_str());
    
    What happens if I (the malicious user) give input like "fakename ; rm -rf --no-preserve-root /;"? Well it creates(/updates the timestamp of) fakename, then tries to delete everything!
  3. Very platform dependent. The POSIX Standard has this to say:
    [T]he system() function shall pass the string pointed to by command to that command processor to be executed in an implementation-defined manner; this might then cause the program calling system() to behave in a non-conforming manner or to terminate. And that's just system. The utility you are calling may vary significantly. Alright, touch probably won't, but I've seen people use system with, for instance, ls, whose output will vary significantly in format across platforms.
So what should we do instead? Well obviously someone wrote touch, so we should be able to replicate it's behaviour from our own program. The logic surrounding parsing arguments and so on is something we should be pretty familiar with. What we need to know is how touch actually creates and updates a file. It needs to make calls out to the operating system ("system calls"). There is a handy command line tool to see what system calls are being made by a program, called strace (on some systems, truss. I don't know a full list of which to use where, but I do know it's strace on Linux and AIX, truss on Solaris and FreeBSD).
I ran strace touch twice, once to create a file, then once to update it. It was basically the same each time, so I'll just show one. You get a lot of cruft just from a program starting up, obtaining heap memory, etc, but I cut it down to just the relevant bits:
$ strace touch testfile
...
open("testfile", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
dup2(0, 0)                              = 0
utimensat(0, NULL, NULL, 0)             = 0
...
The two we care about are open and utimensat. Respectively, these open a file, creating it if necessary (O_CREAT), and update the timestamp. open takes:
  1. const char* pathname
    The path (absolute or relative) of the file (or directory) to be opened.
  2. int flags
    A bitmask of flags, ord together, indicating how to open the path, e.g. O_CREAT to create the file if it doesn't already exist.
  3. mode_t mode
    Only required if O_CREAT is provided, this argument provides the permissions with which to create the file. This will be filtered against your umask: mode^umask.
utimensat takes:
  1. int dirfd
    An open file descripter to a directory from which to interpret a relative path. We will use the special value AT_FDCWD, which just means we interpret relative paths from the working directory of the program.
  2. const char* pathname
    As above.
  3. const struct timespec times[2]
    Two sets of values defining the times to be set. By passing a null pointer for this array, we just get the current time.
  4. int flags
    Another bitmask specifying details of how the call will be carried out. Nothing relevant to us.
So we put these together in a c++ way, and get something like:
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <fcntl.h>
#include <unistd.h>
#include <utime.h>

#include <iostream>
#include <string>

#include <cstdlib>

void touch(const std::string& pathname)
{
    int fd = open(pathname.c_str(),
                  O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK,
                  0666);
    if (fd<0) // Couldn't open that path.
    {
        std::cerr
            << __PRETTY_FUNCTION__
            << ": Couldn't open() path \""
            << pathname
            << "\"\n";
        return;
    }
    int rc = utimensat(AT_FDCWD,
                       pathname.c_str(),
                       nullptr,
                       0);
    if (rc)
    {
        std::cerr
            << __PRETTY_FUNCTION__
            << ": Couldn't utimensat() path \""
            << pathname
            << "\"\n";
        return;
    }
    std::clog
        << __PRETTY_FUNCTION__
        << ": Completed touch() on path \""
        << pathname
        << "\"\n";
}

int main(int argc, char* argv[])
{
    if (argc!=2)
        return EXIT_FAILURE;
    touch (argv[1]);
    return EXIT_SUCCESS;
}
Of course, it would be very easy to rewrite this function in c. Also, if you only want to make sure the file exists, and don't care about the timestamps, you could just create a std::ofstream (remembering to pass app and check is_open()).

6 comments:

  1. Do you know why touch includes this explicit zeroing of the subsecond portion of the time using that utimensat() syscall? Why not let the subsecond time float with the actual file creation time, retaining the subsecond granularity if the filesystem supports it?

    ReplyDelete
  2. You have a memory leak.

    Should close file descriptor after usage.

    ReplyDelete
  3. Your strace output has the pathname as NULL. Reading the Linux manpage, we find that, on Linux: "futimens(fd, times) is implemented as: utimensat(fd, NULL, times, 0);". (This is not a standard feature of utimensat().)

    So, this tells us that touch is using futimens(), not utimensat(), which makes sense. It avoids a second path lookup, but it also means that there can't be any race conditions where the file system changes and you unintentionally update the time on a different file that you just opened/created. So, your implementation would be better if it switched to futimens(), too. It might be more secure, too given that the multiple path lookup pattern is frequently has security implications, even if I don't see it in this case.

    ReplyDelete
  4. Touch is open source. A better approach then strace is to simply look at the source code.

    What if it does some crucial conditional for exceptional cases? Or important conditional compilation for compatibility?

    https://github.com/wertarbyte/coreutils/blob/master/src/touch.c

    ReplyDelete