Posts

Zix 0.4.2

Zix 0.4.2 has been released. Zix is a lightweight C library of portability wrappers and data structures.

Changes:

  • Clean up documentation build
  • Fix documentation build in a virtualenv
  • Improve test suite code coverage

Zix 0.4.0

Zix 0.4.0 has been released. Zix is a lightweight C library of portability wrappers and data structures.

In Search of the Ultimate Compile-Time Configuration System

One of the many programming side quests I embark on from time to time is finding the best way to do compile-time configuration in C and C++ code. This is one of those characteristically C things that most projects need to do, but that has no well-established best practice. What you can find is all over the place, and often pretty half-baked just to suit the particularities of the "official" build. Let's try to come up with something better.

Ideal requirements:

  • Ability to enable or disable any features from the command line by defining symbols, including the ability to override or completely disable any automatic checks implemented in the code.

  • Good integration with, but no hard dependency on, any build system.

  • The code should build with reasonable defaults when simply thrown at a compiler "as-is".

  • Mistakes, such as forgetting to include the configuration header or using misspelled symbols, are caught by tooling (preferably compiler warnings).

  • It's never necessary to modify the code to achieve a particular build.

Here's a skeleton of the best I've managed to come up with so far, for a made-up "mylib" project and a few POSIX functions. It has a bit of boilerplate, but there's good reasons for everything that I'll get to. This configuration header is written manually (not generated) and included (privately) in the source code:

#ifndef MYLIB_CONFIG_H
#define MYLIB_CONFIG_H

#if !defined(MYLIB_NO_DEFAULT_CONFIG)

// Derive default configuration from the build environment

// We need unistd.h to check _POSIX_VERSION
#  ifdef __has_include
#    if __has_include(<unistd.h>)
#      include <unistd.h>
#    endif
#  elif defined(__unix__)
#    include <unistd.h>
#  endif

// Define MYLIB_POSIX_VERSION unconditionally for convenience below
#  if defined(_POSIX_VERSION)
#    define MYLIB_POSIX_VERSION _POSIX_VERSION
#  else
#    define MYLIB_POSIX_VERSION 0
#  endif

// POSIX.1-2001: fileno()
#  if !defined(HAVE_FILENO)
#    if MYLIB_POSIX_VERSION >= 200112L || defined(_WIN32)
#      define HAVE_FILENO 1
#    endif
#  endif

// POSIX.1-2001: posix_fadvise()
#  if !defined(HAVE_POSIX_FADVISE)
#    if MYLIB__POSIX_VERSION >= 200112L && !defined(__APPLE__)
#      define HAVE_POSIX_FADVISE 1
#    endif
#  endif

#endif // !defined(MYLIB_NO_DEFAULT_CONFIG)

// Define USE variables for use in the code

#if defined(HAVE_FILENO) && HAVE_FILENO
#  define USE_FILENO 1
#else
#  define USE_FILENO 0
#endif

#if defined(HAVE_POSIX_FADVISE) && HAVE_POSIX_FADVISE
#  define USE_POSIX_FADVISE 1
#else
#  define USE_POSIX_FADVISE 0
#endif

User interface:

  • By default, features are enabled if they can be detected or assumed to be available from the build environment, unless MYLIB_NO_DEFAULT_CONFIG is defined, which disables everything by default to allow complete control.

  • If a symbol like HAVE_SOMETHING is defined to non-zero, then the "something" feature is assumed to be available. If it is zero, then the feature is disabled.

Usage in code:

  • To check for a feature, the configuration header must be included, and the symbol like USE_SOMETHING (not HAVE_SOMETHING) used as a boolean in an #if expression, like:

    #include "mylib_config.h"
    
    // [snip]
    
    #if USE_FILENO
        int fd = fileno(file);
    #endif
    
  • None of the other configuration symbols described here may be used directly. In particular, the configuration header should be the only place in the code that touches HAVE symbols.

The main "trick" here which allows for all of the different configuration "modes" is the use of two "kinds" of symbol: HAVE symbols and USE symbols. HAVE symbols are exclusively the interface for the user or build system, and USE symbols are the opposite: exclusively for use in the code and never by the user or build system. This way, use of the configuration header is mandatory for any code that needs configuration.

The USE symbols are defined to 0 or 1 unconditionally, and code must check them with #if, not with #ifdef. This prevents mistakes, since both forgetting to include the configuration header, and misspelling a symbol, will be caught by compiler warnings. Tools like include-what-you-use can also enforce direct inclusion more strictly.

From the command line, basic usage is typical: define symbols like HAVE_SOMETHING to enable features. For complete control over the configuration, define MYLIB_NO_DEFAULT_CONFIG, in which case all features must be explicitly enabled. This is mainly useful for build systems, so that all features can be checked for and only those that are found used in the code. It's also useful for avoiding issues with strange compilers or platforms that aren't supported by the checks.

I think this design covers all of the above requirements, and while the header itself can get a bit verbose, it's relatively straightforward and, more importantly, usage of it is simple and resilient to mistakes.

There is one thing here that isn't caught by tooling though: misspelling a HAVE variable will silently not work. This is a concession to the simple case of just defining a few relevant HAVE symbols on the command line, and to keep command lines from the build system as terse as possible. It is however possible to modify this pattern a bit to catch this potential mistake as well: require all known HAVE variables to be defined to 1 or 0, and check those with #if as well in the configuration header itself. This adds a couple of lines per check to the boilerplate, for example:

// POSIX.1-2001, Windows: fileno()
#  ifndef HAVE_FILENO
#    if defined(_WIN32) || defined(_POSIX_VERSION) && _POSIX_VERSION >= 200112L
#      define HAVE_FILENO 1
#    else
#      define HAVE_FILENO 0
#    endif
#  endif

// [snip]

#if HAVE_FILENO
#  define USE_FILENO 1
#else
#  define USE_FILENO 0
#endif

This way, compiler warnings will catch any mistakes in the build system (because, for example, HAVE_FILENO isn't defined), ensuring that everything is explicitly either enabled or disabled. I'm not sure which style to use. Potential silent errors in the build system are pretty bad, but at the same time, I don't want to sacrifice the ability of the code to be easily compiled "manually". It's probably possible to have both, but I'm not sure how painful the boilerplate cost would be. I did have the stricter version for a while, but the extremely verbose compiler command lines were pretty annoying, so I removed it. Now, as I write this, I'm second guessing myself, but so it goes.

Questions for another day, I suppose. One of the things about programming side quests is that they usually never end.

C++ Bindings

For some C libraries, I'd like to include "official" C++ bindings to make life easier for people using them from C++ (which in the audio world, is most). However, that's not something I know a good pattern for, in terms of project organization, installation, versioning, and so on. Figuring one out is a trickier problem than it may seem at first.

In the - in this case literally - "C and C++" world, there is a notorious lack of consistent conventions and best practices in some areas, and this seems to be one of them. So, I suppose I will have to suss out the "best" (and least weird) way myself. The "best" way should:

  • Provide "official" C++ bindings which are developed, maintained, and shipped with the underlying C library.

  • Avoid having the C++ wrapper be locked to the same version as the C library (which is a strict semver reflection of the ABI).

    Rationale: It must be possible to develop the C++ bindings, including make breaking changes, while the C library version (and therefore the ABI) stays the same. Otherwise, it would be nearly impossible to change them, because that'd require changing the version... but the underlying C API version needs to break as infrequently as possible.

  • Isolate the bindings (and "C++ stuff") from the underlying C library as much as possible. Ensure that builds on systems without a C++ toolchain work (this isn't uncommon on minimal or embedded systems which much of this software is appropriate for use on).

  • Avoid making a completely separate new project (repository, test and releasing infrastructure, and so on) if at all possible. The maintenance burden would be far too high, and the bindings would be prone to rot.

  • Use a simple and predictable naming scheme that works with any "main" project name.

Poking around repositories and tinkering a little bit, the best practices I can come up with (for the sort of libraries I'm thinking about anyway), is:

  • Develop and release bindings as a sub-project within the "main" project.

    This is only a "project" in the build system sense. The bindings are maintained in the same git repository, and released in the same archive, as the C library.

  • Name the bindings sub-project by appending a cpp suffix, for example, mylib-cpp. This scheme is... well, not uncommon (for example, in the Debian repositories), and can easily be applied to any name, including libraries that already have multi-word names.

    Following meson requirements, this means the sub-project lives at a path like subprojects/mylib-cpp in the repository.

  • Install a separate "package" (for example via pkg-config) for the bindings, which depends on the one for the underlying C library. The major version is appended to both, for example, mylib-cpp-1 might depend on mylib-1.

  • Keep the C++ bindings themselves as light as possible, and header-only. This avoids link-time issues, making C++ API compatibility a compile-time issue only.

  • Give the bindings package a separate version number and let it increase as necessary. This version is not aligned with that of the underlying C library in any meaningful way. Technically, a given version of the bindings depends on some version of the C library, but in practice, this is always simply the version it's shipped with.

    A strange consequence of this scheme is that the version of the C++ bindings can only drift ever further away, so in the future even major versions may not correspond at all. This is a bit weird, but is the only way to make everything work and be properly versioned. Effectively, the version of the bindings is just an implementation detail, something developers deal with in configuration scripts. From the perspective of packagers or users, there is just one version of the library, the version of the underlying C library - the C++ bindings just may break sometimes, even within a major version of the project as a whole.

    I can't think of any concrete reason why this could be a problem: the urge to have shiny "4.0.0" type version bumps across everything at the same time smells like... marketing, frankly, not engineering. It does make parallel installation of different major versions more difficult, though. Packagers can split up the installation and make separate packages if they really want to. "Upstream" (me) officially doesn't care about parallel installation of different major versions of the C++ bindings.

    All that said, ideally they happen to stay relatively aligned anyway.

  • Make sure there is a simple and obvious option to disable C++ entirely, leaving a C library package with the broadest compatibility possible.

The short, vibes-based description of all that is something like: there is a stable and strictly versioned C library with every effort put into long-term source and binary compatibility, as always... and then there's a C++ bindings sub-project that tags along with it but is otherwise independent. The bindings are more volatile, but it's C++, so they're going to be volatile no matter what you do anyway. The bindings project is universally named by tacking a -cpp or _cpp on the end as appropriate in every context: include directories, package names, and so on.

So, an installation might look something like this:

include/dostuff-1/dostuff.h
include/dostuff-cpp-4/dostuff.hpp
lib/libdostuff-1.so
lib/libdostuff-1.so.1
lib/libdostuff-1.so.1.2.4
lib/pkgconfig/dostuff-1.pc
lib/pkgconfig/dostuff-cpp-4.pc

In the source code, the bindings and any supporting C++ code is entirely contained within the subproject, except for a minimal skeleton to handle compile time options and so on. This can be more work than a single heterogeneous project in some ways, less work in others, but overall I think it has more maintenance benefits. Importantly, it keeps any new issues or volatility as far away from the C library as possible, making it easy to see if a change could possibly break the ABI or the C library at all, for example.

This scheme may be extended to other languages if that's appropriate. The naming scheme for Python is like python-dostuff. It probably makes more sense to maintain Cython wrappers as separate projects maintained in the Python way (sigh...), but the whole point of a naming scheme is to have space for things in case you need them. In reality, language bindings are usually done independently by other people in separate projects (Rust folks will use Cargo in a separate repository, and so on).

All of this is, obviously, a massively over-thought bikeshed, but adding multiple programming languages and multiple versioning and compatibility schemes/philosophies to a project is a bit tricky. I can't just copy from an existing best-practice pattern I've been honing for years like I can with straight C libraries. This approach seems like it shouldn't cause too much trouble, though.

That said, I'm just making this up as I go along and have no experience maintaining anything quite like this (only more or less homogeneous C or C++ libraries), so feedback is, as always, welcome. I may revise this post if anything turns out to be a mistake, so it can ultimately serve as a reference for the next person trying to figure out how to do "C family" source code releases right.

Beautiful C and C++ Documentation with Sphinx

Like many, I've long suffered under the antiquated and inflexible HTML documentation generated by Doxygen. Having recently worked on some Python documentation using Sphinx, though, I found it powerful and pleasant enough to use. It also has a way of encouraging actually writing documentation, rather than just generating a dump of glorified comments, which is a good thing. Though I'm not at all a fan of ReStructuredText syntax (which at times seems like it's trying to be cryptic on purpose), Sphinx is undeniably powerful, and I like the "assemble a bunch of plainish text files" approach in general. The support for multiple languages is also very appealing, though not without its problems, as we'll get to.

So, is it possible to use Sphinx to generate documentation for C and C++ libraries? Yes! As explained somewhat recently in a post by Sy Brand, there is a project called Breathe that integrates Doxygen (for extracting documentation) with Sphinx (for generating output). That sounded promising, so I attempted to migrate a library to using Breathe instead of Doxygen's HTML support. Unfortunately, though, I encountered quite a few roadblocks where I couldn't quite get output that I was happy with. Worse, the project itself is very complicated, and as I poked around in swaths of originally generated but manually modified code, I decided that Breathe was not for me. That would feel like just exchanging one inflexible and unhackable system for another.

What, then, to do? Though I realize that deep integration via modules like Breathe is usually the way things are done with Sphinx, I am a KISS sort of person, so I like to think of it as something more like a Static Site Generator: it reads a bunch of plainish text input files, and outputs HTML (or whatever other presentation format). How do we describe C and C++ things in Sphinx? It turns out that recent versions have built-in support for these "domains" now, which define markup for describing everything in these languages. This means that everything to do with nicely formatting and cross-referencing C and C++ is already dealt with out of the box. Excellent.

So, taking a step back and assessing the situation: we have some XML files that describe the documentation, and we have a tool that reads text files and produces nice documentation. This strikes me as a relatively straightforward task for a nice and simple "files in, files out" script, not somewhere a Goldbergian contraption that mashes Doxygen into Sphinx is required. So, after investigating any other promising options (no such luck), I resigned myself to trying to write such a thing, at the very least to see if it's feasible. I certainly have no time or interest in writing and maintaining a Documentation System, but a self-contained script to convert one thing to another seems reasonable enough.

As it turns out, I wouldn't call it trivial, but it's certainly feasible. I ended up with a ~700 line Python script that does everything I need (though this is of course not the same as everything possible). It's a bit "gluey" and makes some assumptions about the structure and so on, but it does the job and is something I feel I can maintain as necessary. I won't be publishing or supporting this as an independent project any time soon, and make no claims about it being general purpose, but feel free to steal it if any of this sounds appealing.

With this, I was able to get around some long-standing gripes I have with Doxygen, and easily make whatever I wanted to happen a reality, so I'm pretty happy with this approach. Everything is nicely decoupled, so I don't feel over-invested in any of the tools involved. If, for example, someone finally writes a good clang-based extractor that gains traction (JSON please, I did not enjoy this revisitation of the horrors of XML at all), I should be able to switch to using that easily enough. I've actually found this somewhat crude and UNIXey approach quite convenient: you can simply look at the ReST files to understand what is happening, or tweak them a bit and run Sphinx to test what you're aiming for, and so on. Text files are good.

So, after however many years, I think I've found an approach to documentation I'm actually quite happy with, that can support all of the languages that I use, and in general doesn't seem to get in my way. Hooray. For starters, I did my window system portability layer, Pugl. The generated documentation for the C API can be seen at https://lv2.gitlab.io/pugl/c/singlehtml/, and the C++ at https://lv2.gitlab.io/pugl/cpp/singlehtml/. This is more or less the standard Alabaster theme with a few tweaks, which I'm not sure feels appropriate for API documentation (and is much more bloated with a bunch of Javascript than I'd like), but it's pretty enough, at least. I'll tinker with themes later when I feel like jumping down that rabbit hole.

The slightly cumbersome links are an artifact of the one problem I encountered using Sphinx domains: you can't really document C and C++ APIs nicely in the same documentation set. If you use the cpp domain everywhere, you get name mangling in links even for C symbols, which is really unfortunate, and you can't really mix them. To take a contrived example, if you have a struct MylibThing in C, then a type alias in C++ like using Thing = MylibThing, Sphinx isn't clever enough to figure out that MylibThing is from C, and will generate warnings and not link correctly. Perhaps someday it will, which would be nice, but for now I opted to simply generate completely separate documentation sets. This means the C documentation is duplicated in the C++ documentation so that things can be hyperlinked, which isn't ideal, but I can live with it. A certain amount of redundancy is inherent in multi-language documentation anyway.

As I add Python bindings to most libraries, having a unified documentation system for all of these languages will be very nice. There is one additional thing I'll need at some point for the LV2 documentation in particular: a domain for RDF properties and classes. The LV2 documentation really suffers from an unnatural code (via Doxygen) and data (via lv2specgen) documentation split, and my hope is that Sphinx can provide a nice environment for writing documentation that refers to both worlds freely. That, unfortunately, will be much more work, but hopefully writing a custom Sphinx domain isn't too hard...

Page 1 / 1