Wednesday, 16 March 2016

Shakespearian Insult Generator

See quizlet.com for list of terms, and mouse over above for definitions.

Saturday, 27 February 2016

Why You Shouldn't Use std::endl

There are two terrible habits that I always see in beginner questions on Stack Overflow, that seem to be taught in a lot of books, online tutorials, classes, etc. If your resource is teaching either of these, it may be a sign that you should ditch it, and pick up a good book.
These are:
  • using namespace std;
  • std::cout << "Foo" << std::endl;
    (or worse
    cout << "Foo" << endl
    ).
There are many places to read about why the first is bad, but I’ve lacking a good link for the second. So that’s what I want to talk about here.
Do you use std::endl to end lines when streaming text? Do you know what it does?
std::endl does two things:
  • writes '\n' to the stream.
  • flushes the stream.
And nothing else. I’ve seen people mention that it is the right thing to use to get cross platform line endings. This is just wrong; streaming std::endl is guaranteed to do the same thing as streaming '\n', and platforms make their own guarantees about expanding this into their canonical line endings (for instance, it becomes <CR><LF> on Windows). std::endl exists in the Standard Library only for those situations where you want to both write a newline character and flush the stream. I think, with the benefit of hindsight, this is wrong.
In my mind, these are two entirely unrelated operations. The first is simply writing a character to the target underlying the stream, no more special than any other. The second is an administrative action on the stream object itself, and one that I have rarely seen a good reason to carry out manually. Why would you want to do both at once? Some possibilities:
  • You have some urgent output you want the user to see immediately. Sounds like a perfect case for std::cerr, which has unitbuf set so will display all output immediately and never needs to be manually flushed, entirely for this purpose.
  • You want to display a prompt and make sure it appears before asking for input. std::cout and std::cin are tied together, so this will happen automatically. (Note they are also synced with the C equivalents).
  • You want some sort of live updating output that is not urgent per se, and is not part of user interaction (interleaved with reading). Well it may be the case that for a live updating UI (e.g. top), basic console output is not the best thing, and you should use a toolkit such as ncurses. But let’s say you are just writing a basic example, like this pendulum simulator:
    while (true) {
        std::cout << "Tick" << std::endl;
        sleep(1);
        std::cout << "Tock" << std::endl;
        sleep(1);
    }
    Ok, now I’ve painted myself into a corner where you might legitimately want to flush the stream each time. But the delimiter is still irrelevant to the flushing. What if you decided to separate the ticks with spaces or tabs instead? And anyway, many implementations of std::cout are line-buffered when writing to an actual terminal, for instance libstdc++ (default with gcc), but of course you can’t 100% rely on that and remain completely cross-platform.
So I’ve argued that writing a newline and flushing a stream are unrelated operations, and you rarely want to do the latter anyway. But maybe you do occasionally want to do both at the same time. Isn’t std::endl ideal for that?
I’d still argue no. It’s very important to clearly express your intent in code. Comments are important of course, but it’s even better when they simply aren’t necessary. Comments can be out of date. The code can’t. Now many beginners (and some more experienced programmers, sadly) simply don’t know that std::endl flushes. So when I see it used, I simply have no idea if the original author really intended to flush or not. I see many uses of std::endl where flushing makes absolutely no sense whatsoever, and plenty of uses where it is certainly not clear that flushing is useful.
So what do I recommend? Use '\n', and std::flush if you really do mean it. You may as well put the '\n' into the preceding string literal while you are at it.
std::cout << "foo\n";
std::cout << "Some int: " << i << '\n';
std::cout << "bar\n" << std::flush;
If your printing is a bit convoluted and you really do want to make it clear where you are printing a newline, you can separate it from the preceding string literal, and even give it a name if you like:
namespace cds {
    char const nl = '\n';
}
// ...
std::cout << "Tick" << cds::nl;
Or you can model it on more closely on std::endl:
namespace cds {
    std::ostream& nl(std::ostream& os) {
        return os << '\n';
    }
}
// ...
std::cout << "Tick" << cds::nl;
If you stream a function that takes and returns an std::ostream&, it is called on the stream.
And this is without mentioning the genuine performance problem all the extra flushing can cause. (Dietmar also provides a better nl manipulator, that will work on streams with a character type other than char).

Sunday, 28 July 2013

Contextually converted to bool

Something I found mildly surprising about C++11 is that this works:
#include <iostream>

struct Testable
{
    explicit operator bool() const { return true; }
};

int main()
{
    Testable t;
    if (t)
        std::cout << "Converted to true!\n";
}
That is, it compiles and prints Converted to true!.
The new bit here is the explicit keyword. When I first saw an example like this, I expected to have to write
if (bool( t )) // ...
If this was an operator someOtherType(), we would have to write the conversion explicitly. If we wanted to use t say as an argument to a function accepting bool, we would also have to write the conversion explicitly (it is the explicit keyword, after all!).
What makes this work is this wording in The Standard, at Section 4 Standard conversions, paragraph 3:
An expression e can be implicitly converted to a type T if and only if the declaration T t=e; is well-formed, for some invented temporary variable t (8.5). Certain language constructs require that an expression be converted to a Boolean value. An expression e appearing in such a context is said to be contextually converted to bool and is well-formed if and only if the declaration bool t(e); is well-formed, for some invented temporary variable t (8.5). The effect of either implicit conversion is the same as performing the declaration and initialization and then using the temporary variable as the result of the conversion.
So basically, in any kind of logical condition, an explicit operator bool() can be called without having to write bool(...). This replaces the need for workarounds such as operator void*, or The Safe bool Idiom (if you were wondering why you wouldn't just use operator bool() without the explicit, that link will explain it).

I've been trying to enumerate all the cases in The Standard where it uses the wording "contextually converted to bool", but I may have missed some:
  • Negation: !t (5.3.1 p9).
  • Logical AND: t&&s (5.14).
  • Logical OR: t||s (5.15).
  • Conditional operator: t?"yup":"nope" (5.16 p1).
  • Selection statement (other than switch): if (t) or if (Testable t{}) (6.4 p4).
  • for statement, for(;t;) //..., and
  • while statement, while(t) //.... The wording isn't used directly for these, and they are actually defined in section 6.5, but 6.4 p2 says "The rules for conditions apply both to selection-statements and to the for and while statements (6.5)."
  • do statement: do {//...} while (t); (6.5.2 p1).
  • static-assert_declaration: static_assert(t); (note you will need constexpr here) (7 p4).
  • Exception specification: SomeType foo() noexcept(t); (note you will need constexpr here) (15.4 p1).
  • NullablePointer concept: Any type P that can be used where the standard library requires a type fulfilling the NullablePointer concept, it is required that an instance of P can be contextually converted to bool (17.6.3.3 p3).
  • Any algorithm in the <algorithm> header that takes a template parameter named Predicate, then for an instance pred of that type, it must support pred(*first) being contextually converted to type bool. (25.1 p8)
    Similarly, for a BinaryPredicate binary_pred, it is required that binary_pred(*first1, *first2) or binary_pred(*first1, value) can be contextually converted to type bool. (25.1 p9)
  • For any algorithm taking a comparator type Compare, for an instance Compare comp, the return value when contextually converted to type bool must convert to true if the first argument is less than the second, and false otherwise. (25.4 p2)
And I think that's it! This is being made use of in various places throughout the c++11 Standard, for instance in the basic_ios template, operator void*() has been replaced with explicit operator bool(). It is also used to simulate checking for NULL in the Standard smart pointers, shared_ptr and unique_ptr. Have a look through these.

Note: I don't have the official ISO C++ Standard published 2011-09-01. I am looking at N3337, the first draft published after the official Standard, on 2012-16-01. In N3337, the wording for Logical AND is contextually converted to type bool. This is corrected in the most recent c++14 draft N3691 (2013-05-16).

Let me make it entirely clear that the previous methods of achieving this behaviour (e.g. operator void*()) still work as ever. It's just there is now a nicer way (I consider it nicer in that it more clearly indicates intent, and actually does what you wanted).

Tuesday, 28 May 2013

Better than system("touch")

I've seen a lot of people use
system("touch");
to make sure a file exists, and/or has a recent access/modification time. For example, see here, here and here.
I'm here to tell you: system() sucks! Why? Take a look at man system:
       int system(const char *command);

DESCRIPTION
       system() executes a command specified in command by calling /bin/sh
       -c command, and returns after the command has been completed.
So it's not just running the touch program. It's starting a shell, then running whatever you passed in that shell. This is:
  1. Slow. First you start a shell, then you start another program from that shell? Seems like a lot of hassle.
  2. A security risk. Say you take the filename from the user, then run something like:
    std::stringstream ss;
    ss << "touch " << filename;
    system(ss.str().c_str());
    
    What happens if I (the malicious user) give input like "fakename ; rm -rf --no-preserve-root /;"? Well it creates(/updates the timestamp of) fakename, then tries to delete everything!
  3. Very platform dependent. The POSIX Standard has this to say:
    [T]he system() function shall pass the string pointed to by command to that command processor to be executed in an implementation-defined manner; this might then cause the program calling system() to behave in a non-conforming manner or to terminate. And that's just system. The utility you are calling may vary significantly. Alright, touch probably won't, but I've seen people use system with, for instance, ls, whose output will vary significantly in format across platforms.
So what should we do instead? Well obviously someone wrote touch, so we should be able to replicate it's behaviour from our own program. The logic surrounding parsing arguments and so on is something we should be pretty familiar with. What we need to know is how touch actually creates and updates a file. It needs to make calls out to the operating system ("system calls"). There is a handy command line tool to see what system calls are being made by a program, called strace (on some systems, truss. I don't know a full list of which to use where, but I do know it's strace on Linux and AIX, truss on Solaris and FreeBSD).
I ran strace touch twice, once to create a file, then once to update it. It was basically the same each time, so I'll just show one. You get a lot of cruft just from a program starting up, obtaining heap memory, etc, but I cut it down to just the relevant bits:
$ strace touch testfile
...
open("testfile", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
dup2(0, 0)                              = 0
utimensat(0, NULL, NULL, 0)             = 0
...
The two we care about are open and utimensat. Respectively, these open a file, creating it if necessary (O_CREAT), and update the timestamp. open takes:
  1. const char* pathname
    The path (absolute or relative) of the file (or directory) to be opened.
  2. int flags
    A bitmask of flags, ord together, indicating how to open the path, e.g. O_CREAT to create the file if it doesn't already exist.
  3. mode_t mode
    Only required if O_CREAT is provided, this argument provides the permissions with which to create the file. This will be filtered against your umask: mode^umask.
utimensat takes:
  1. int dirfd
    An open file descripter to a directory from which to interpret a relative path. We will use the special value AT_FDCWD, which just means we interpret relative paths from the working directory of the program.
  2. const char* pathname
    As above.
  3. const struct timespec times[2]
    Two sets of values defining the times to be set. By passing a null pointer for this array, we just get the current time.
  4. int flags
    Another bitmask specifying details of how the call will be carried out. Nothing relevant to us.
So we put these together in a c++ way, and get something like:
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <fcntl.h>
#include <unistd.h>
#include <utime.h>

#include <iostream>
#include <string>

#include <cstdlib>

void touch(const std::string& pathname)
{
    int fd = open(pathname.c_str(),
                  O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK,
                  0666);
    if (fd<0) // Couldn't open that path.
    {
        std::cerr
            << __PRETTY_FUNCTION__
            << ": Couldn't open() path \""
            << pathname
            << "\"\n";
        return;
    }
    int rc = utimensat(AT_FDCWD,
                       pathname.c_str(),
                       nullptr,
                       0);
    if (rc)
    {
        std::cerr
            << __PRETTY_FUNCTION__
            << ": Couldn't utimensat() path \""
            << pathname
            << "\"\n";
        return;
    }
    std::clog
        << __PRETTY_FUNCTION__
        << ": Completed touch() on path \""
        << pathname
        << "\"\n";
}

int main(int argc, char* argv[])
{
    if (argc!=2)
        return EXIT_FAILURE;
    touch (argv[1]);
    return EXIT_SUCCESS;
}
Of course, it would be very easy to rewrite this function in c. Also, if you only want to make sure the file exists, and don't care about the timestamps, you could just create a std::ofstream (remembering to pass app and check is_open()).

Reading Input with std::getline

A lot of beginners seem to have trouble with more complicated input, especially with reading in a loop until the end of a file (or end of input through std::cin). I thought about trying to go through all the various things I've seen people do wrong, but that could get pretty messy. So instead I thought I'd just show some examples building up to doing it right (or at least in a way that works, and that I think is fairly nice. "Right" is open to a lot of interpretation).
Let's start simple. Read a user's name, and greet them:
#include <iostream>
#include <string>

std::string readName()
{
    std::cout << "What's your name? ";
    std::string name;
    std::cin >> name;
    return name;
}

void greet (const std::string& name)
{
    std::cout << "Hello, " << name << '\n';
}

int main()
{
    greet(readName());
}
(By the way, if you didn't already know, it is safe to bind a const T& to a temporary, but not a T&). We run this:
$ ./hello1 
What's your name? Chris Sharpe
Hello, Chris
Pretty close, but reading with std::cin >> someString stops at the first whitespace. Instead we are going to use std::getline. The only change is in readName():
std::string readName()
{
    std::cout << "What's your name? ";
    std::string name;
    std::getline(std::cin, name); // Change here.
    return name;
}
Then:
$ ./hello2 
What's your name? Chris Sharpe
Hello, Chris Sharpe
That's what we want! What about a more complicated, and more useful example. Reading a configuration file, where each option might have a different type of value, for instance a filename (std::string) or switch for some program behavior (bool). It is convenient to use the formatted input we get with >>, but we want to actually read from the input file with getline.
The important functions are parseConfigFile() and parseConfigLine().
#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
#include <utility>

// Class to hold the program's configuration options.
class Config
{
    private:
        // Member data
        std::string someFilePath_;
        int         someSize_;
        bool        someSwitch_;

        // Static data: names of options
        static const std::string SOME_FILE_PATH;
        static const std::string SOME_SIZE;
        static const std::string SOME_SWITCH;

        // Private functions
        void parseConfigFile(const std::string&);
        void parseConfigLine(const std::string&);

    public:
        // Constructors
        Config();
        Config(const std::string&);

        // Accessors
        const std::string& someFilePath() const;
        int                someSize()     const;
        bool               someSwitch()   const;

        std::string dumpConfigAsString()  const;
};


int main(int argc, char* argv[])
{
    if (argc>1)
    {
        Config config{argv[1]};
        std::cout << config.dumpConfigAsString();
    }
    else
    {
        Config defaultConfig{};
        std::cout << defaultConfig.dumpConfigAsString();
    }
    return 0;
}


// class Config

const std::string Config::SOME_FILE_PATH {"some_file_path"};
const std::string Config::SOME_SIZE {"some_size"};
const std::string Config::SOME_SWITCH {"some_switch"};

Config::Config()
    // Set the defaults
    : someFilePath_{"default.png"}
    , someSize_{2}
    , someSwitch_{true}
{}

Config::Config(const std::string& configFilePath)
    : Config{} // Set the defaults
{
    parseConfigFile(configFilePath);
}


void Config::parseConfigFile(const std::string& configFilePath)
{
    std::ifstream inpFile{configFilePath};
    if (!inpFile.is_open())
    {
        std::cerr
            << "Could not open configuration file \""
            << configFilePath
            << "\"\n" // Note the string concatenation
               "Using default configuration options.\n";
        return;
    }

    std::string configLine;
    // Read a line at a time.
    // Doing this inside the loop condition means
    // we end correctly at the bottom of the file.
    while ( std::getline(inpFile, configLine) )
    {
        parseConfigLine(configLine);
    }
}

void Config::parseConfigLine(const std::string& configLine)
{
    // Ignore comment or empty lines
    if ('#' == configLine[0] || configLine.empty())
        return;

    // Split the line using >> operations
    std::istringstream iss {configLine};

    std::string configOption;
    iss >> configOption;

    // Compare against the known configurable options
    if ( SOME_FILE_PATH == configOption )
    {
        std::string tmpString;
        if (iss >> tmpString)
        {
            someFilePath_ = std::move(tmpString);
        }
        else // The read to std::string failed
        {
            std::cerr
                << "Failed to read configuration option \""
                << SOME_FILE_PATH
                << "\" as a string.\n";
        }
    }
    else if ( SOME_SIZE == configOption )
    {
        int tmpInt;
        if (iss >> tmpInt)
        {
            someSize_ = tmpInt;
        }
        else // The read to int failed
        {
            std::cerr
                << "Failed to read configuration option \""
                << SOME_SIZE
                << "\" as an integer.\n";
        }
    }
    else if ( SOME_SWITCH == configOption )
    {
        bool tmpBool;
        if (iss >> std::boolalpha >> tmpBool)
        {
            someSwitch_ = tmpBool;
        }
        else // The read to bool failed
        {
            std::cerr
                << "Failed to read configuration option \""
                << SOME_SWITCH
                << "\" as a boolean switch.\n";
        }
    }
    else
    {
        std::cerr
            << "Unrecognised configuration option \""
            << configOption
            << "\"\n";
    }
}

const std::string& Config::someFilePath() const
{
    return someFilePath_;
}

int                Config::someSize()     const
{
    return someSize_;
}

bool               Config::someSwitch()   const
{
    return someSwitch_;
}

std::string Config::dumpConfigAsString()  const
{
    std::ostringstream oss;
    oss
        << SOME_FILE_PATH << ' ' << someFilePath() << '\n'
        << SOME_SIZE      << ' ' << someSize()     << '\n'
        << SOME_SWITCH    << ' ' << std::boolalpha
                                 << someSwitch()   << '\n';
    return oss.str();
}
This is fairly long, just to make it a realistic example, but there are two absolutely key points:
  1. Don't mix use of operator>> and getline on the same stream. operator>> will leave a newline character on the stream. Any following use of getline would immediately hit that and return an empty line. This can really confuse people, especially when the reads happen far apart in code, so there is no obvious connection.
  2. Have the read as the loop condition when reading a whole file. This is partly because you want to check immediately after the read, and partly because you might otherwise be tempted to check stream.eof(), which will only work if you have just tried to read past the end of the file. Not if that last read used up all the characters, and you'll get completely stuck if the last few characters can't be read in the way you want (i.e. you are using formatted input with operator>>). Consider this badly broken example:
    int main()
    {
        int tmpInt;
        while (!std::cin.eof())
        {
            std::cout << "Number: ";
            std::cin >> tmpInt;
            std::cout << "Read " << tmpInt << '\n';
        }
    }
    
    A run of this program might go like this:
    $ ./a.out 
    Number: 123
    Read 123
    Number: abc
    Read 0
    Number: Read 0
    Number: Read 0
    Number: Read 0
    ...
    
    You can see it gets stuck in the loop because it can't actually read any further, but we technically haven't read past the end of the file.
If you've been paying attention, you'll have noticed this is broken if the file name has a space in it. How could you fix that?
As an aside, you might have heard of the GNU Readline Library which allows entry history, line editing, and other cool stuff. Recently I was working with a little test application that would be much easier to use with these features, but I couldn't make the changes to use Readline, so I started writing a little wrapper shell script. After wasting a fair amount of effort for something that worked ok, I found rlwrap. Guess I should have searched harder in the first place. There's always someone who's already solved your problem.