25 January 2011

Converting to and from human-readable byte counts

Converting some number of bytes (say 1024) into a human-readable byte count (say "1K") seems to be a common problem (simply search or ask StackOverflow). I also needed to go the other direction and turn things like "1.5 MB" into 1,572,864. Here's what I cooked up in C99:

// Adapted from http://stackoverflow.com/questions/3758606/
// how-to-convert-byte-size-into-human-readable-format-in-java
void to_human_readable_byte_count(long bytes,
                                  int si,
                                  double *coeff,
                                  const char **units)
{
    // Static lookup table of byte-based SI units
    static const char *suffix[][2] = { { "B",  "B"   },
                                       { "kB", "KiB" },
                                       { "MB", "MiB" },
                                       { "GB", "GiB" },
                                       { "TB", "TiB" },
                                       { "EB", "EiB" },
                                       { "ZB", "ZiB" },
                                       { "YB", "YiB" } };
    int unit = si ? 1000 : 1024;
    int exp = 0;
    if (bytes > 0) {
        exp = min( (int) (log(bytes) / log(unit)),
                   (int) sizeof(suffix) / sizeof(suffix[0]) - 1);
    }
    *coeff = bytes / pow(unit, exp);
    *units  = suffix[exp][!!si];
}

// Convert strings like the following into byte counts
//    5MB, 5 MB, 5M, 3.7GB, 123b, 456kB
// with some amount of forgiveness baked into the parsing.
long from_human_readable_byte_count(const char *str)
{
    // Parse leading numeric factor
    char *endptr;
    errno = 0;
    const double coeff = strtod(str, &endptr);
    if (errno) return -1;

    // Skip any intermediate white space
    while (isspace(*endptr)) ++endptr;

    // Read off first character which should be an SI prefix
    int exp  = 0;
    int unit = 1024;
    switch (toupper(*endptr)) {
        case 'B':  exp =  0; break;
        case 'K':  exp =  3; break;
        case 'M':  exp =  6; break;
        case 'G':  exp =  9; break;
        case 'T':  exp = 12; break;
        case 'E':  exp = 15; break;
        case 'Z':  exp = 18; break;
        case 'Y':  exp = 21; break;

        case ' ':
        case '\t':
        case '\0': exp =  0; goto done;

        default:   return -1;
    }
    ++endptr;

    // If an 'i' or 'I' is present use SI factor-of-1000 units
    if (toupper(*endptr) == 'I') {
        ++endptr;
        unit = 1000;
    }

    // Next character must be one of B/empty/whitespace
    switch (toupper(*endptr)) {
        case 'B':
        case ' ':
        case '\t': ++endptr;  break;

        case '\0': goto done;

        default:   return -1;
    }

    // Skip any remaining white space
    while (isspace(*endptr)) ++endptr;

    // Parse error on anything but a null terminator
    if (*endptr) return -1;

done:
    return exp ? coeff * pow(unit, exp / 3) : coeff;
}

15 January 2011

Using Argp with MPI-based applications

Argp is a great, great parser for command line options. When using it in MPI-based applications, there's a catch in that you want only one MPI rank to print --help information, usage warnings, error messages, etc. Otherwise, you get a whole mess of repeated, jumbled output as each MPI rank squawks about the same problem.

Here's a wrapper for Argp for use in MPI-based applications that solves just this nuisance:

#include <stdio.h>
#include <unistd.h>
#include "argp.h"

/**
 * Call <a href="http://www.gnu.org/s/libc/manual/html_node/Argp.html"
 * >Argp</a>'s \c argp_parse in an MPI-friendly way.  Processes
 * with nonzero rank will have their \c stdout and \c stderr redirected
 * to <tt>/dev/null</tt> during \c argp_parse.
 *
 * @param rank MPI rank of this process.  Output from \c argp_parse
 *             will only be observable from rank zero.
 * @param argp      Per \c argp_parse semantics.
 * @param argc      Per \c argp_parse semantics.
 * @param argv      Per \c argp_parse semantics.
 * @param flags     Per \c argp_parse semantics.
 * @param arg_index Per \c argp_parse semantics.
 * @param input     Per \c argp_parse semantics.
 *
 * @return Per \c argp_parse semantics.
 */
error_t mpi_argp_parse(const int rank,
                       const struct argp *argp,
                       int argc,
                       char **argv,
                       unsigned flags,
                       int *arg_index,
                       void *input);

error_t mpi_argp_parse(const int rank,
                       const struct argp *argp,
                       int argc,
                       char **argv,
                       unsigned flags,
                       int *arg_index,
                       void *input)
{
    // Flush stdout, stderr
    if (fflush(stdout))
        perror("mpi_argp_parse error flushing stdout prior to redirect");
    if (fflush(stderr))
        perror("mpi_argp_parse error flushing stderr prior to redirect");

    // Save stdout, stderr so we may restore them later
    int stdout_copy, stderr_copy;
    if ((stdout_copy = dup(fileno(stdout))) < 0)
        perror("mpi_argp_parse error duplicating stdout");
    if ((stderr_copy = dup(fileno(stderr))) < 0)
        perror("mpi_argp_parse error duplicating stderr");

    // On non-root processes redirect stdout, stderr to /dev/null
    if (rank) {
        if (!freopen("/dev/null", "a", stdout))
            perror("mpi_argp_parse error redirecting stdout");
        if (!freopen("/dev/null", "a", stderr))
            perror("mpi_argp_parse error redirecting stderr");
    }

    // Invoke argp per http://www.gnu.org/s/libc/manual/html_node/Argp.html
    error_t retval = argp_parse(argp, argc, argv, flags, arg_index, input);

    // Flush stdout, stderr again
    if (fflush(stdout))
        perror("mpi_argp_parse error flushing stdout after redirect");
    if (fflush(stderr))
        perror("mpi_argp_parse error flushing stderr after redirect");

    // Restore stdout, stderr
    if (dup2(stdout_copy, fileno(stdout)) < 0)
        perror("mpi_argp_parse error reopening stdout");
    if (dup2(stderr_copy, fileno(stderr)) < 0)
        perror("mpi_argp_parse error reopening stderr");

    // Close saved versions of stdout, stderr
    if (close(stdout_copy))
        perror("mpi_argp_parse error closing stdout_copy");
    if (close(stderr_copy))
        perror("mpi_argp_parse error closing stderr_copy");

    // Clear any errors that may have occurred on stdout, stderr
    clearerr(stdout);
    clearerr(stderr);

    // Return what argp_parse returned
    return retval;
}

04 January 2011

C header-only unit testing with FCTX

Just a quick hat tip to FCTX, a library I've found invaluable these past few months. FCTX provides header-only unit testing for C. Sure, if you're in C++ land there's a ton of xUnit-like frameworks available (with Boost.Test being my favorite), but for vanilla C projects FCTX wins hands down.

As an example, here's something I put together for a Stack Overflow response: The logic isn't rocket science, of course. But testing it in C without resorting to external libraries and complicated makefiles shouldn't be rocket science either. Provided that fct.h is in the same directory, this source will compile and run.