Posts

C++ Bindings

For some C libraries, I'd like to include "official" C++ bindings to make life easier for people using them from C++ (which in the audio world, is most). However, that's not something I know a good pattern for, in terms of project organization, installation, versioning, and so on. Figuring one out is a trickier problem than it may seem at first.

In the - in this case literally - "C and C++" world, there is a notorious lack of consistent conventions and best practices in some areas, and this seems to be one of them. So, I suppose I will have to suss out the "best" (and least weird) way myself. The "best" way should:

  • Provide "official" C++ bindings which are developed, maintained, and shipped with the underlying C library.

  • Avoid having the C++ wrapper be locked to the same version as the C library (which is a strict semver reflection of the ABI).

    Rationale: It must be possible to develop the C++ bindings, including make breaking changes, while the C library version (and therefore the ABI) stays the same. Otherwise, it would be nearly impossible to change them, because that'd require changing the version... but the underlying C API version needs to break as infrequently as possible.

  • Isolate the bindings (and "C++ stuff") from the underlying C library as much as possible. Ensure that builds on systems without a C++ toolchain work (this isn't uncommon on minimal or embedded systems which much of this software is appropriate for use on).

  • Avoid making a completely separate new project (repository, test and releasing infrastructure, and so on) if at all possible. The maintenance burden would be far too high, and the bindings would be prone to rot.

  • Use a simple and predictable naming scheme that works with any "main" project name.

Poking around repositories and tinkering a little bit, the best practices I can come up with (for the sort of libraries I'm thinking about anyway), is:

  • Develop and release bindings as a sub-project within the "main" project.

    This is only a "project" in the build system sense. The bindings are maintained in the same git repository, and released in the same archive, as the C library.

  • Name the bindings sub-project by appending a cpp suffix, for example, mylib-cpp. This scheme is... well, not uncommon (for example, in the Debian repositories), and can easily be applied to any name, including libraries that already have multi-word names.

    Following meson requirements, this means the sub-project lives at a path like subprojects/mylib-cpp in the repository.

  • Install a separate "package" (for example via pkg-config) for the bindings, which depends on the one for the underlying C library. The major version is appended to both, for example, mylib-cpp-1 might depend on mylib-1.

  • Keep the C++ bindings themselves as light as possible, and header-only. This avoids link-time issues, making C++ API compatibility a compile-time issue only.

  • Give the bindings package a separate version number and let it increase as necessary. This version is not aligned with that of the underlying C library in any meaningful way. Technically, a given version of the bindings depends on some version of the C library, but in practice, this is always simply the version it's shipped with.

    A strange consequence of this scheme is that the version of the C++ bindings can only drift ever further away, so in the future even major versions may not correspond at all. This is a bit weird, but is the only way to make everything work and be properly versioned. Effectively, the version of the bindings is just an implementation detail, something developers deal with in configuration scripts. From the perspective of packagers or users, there is just one version of the library, the version of the underlying C library - the C++ bindings just may break sometimes, even within a major version of the project as a whole.

    I can't think of any concrete reason why this could be a problem: the urge to have shiny "4.0.0" type version bumps across everything at the same time smells like... marketing, frankly, not engineering. It does make parallel installation of different major versions more difficult, though. Packagers can split up the installation and make separate packages if they really want to. "Upstream" (me) officially doesn't care about parallel installation of different major versions of the C++ bindings.

    All that said, ideally they happen to stay relatively aligned anyway.

  • Make sure there is a simple and obvious option to disable C++ entirely, leaving a C library package with the broadest compatibility possible.

The short, vibes-based description of all that is something like: there is a stable and strictly versioned C library with every effort put into long-term source and binary compatibility, as always... and then there's a C++ bindings sub-project that tags along with it but is otherwise independent. The bindings are more volatile, but it's C++, so they're going to be volatile no matter what you do anyway. The bindings project is universally named by tacking a -cpp or _cpp on the end as appropriate in every context: include directories, package names, and so on.

So, an installation might look something like this:

include/dostuff-1/dostuff.h
include/dostuff-cpp-4/dostuff.hpp
lib/libdostuff-1.so
lib/libdostuff-1.so.1
lib/libdostuff-1.so.1.2.4
lib/pkgconfig/dostuff-1.pc
lib/pkgconfig/dostuff-cpp-4.pc

In the source code, the bindings and any supporting C++ code is entirely contained within the subproject, except for a minimal skeleton to handle compile time options and so on. This can be more work than a single heterogeneous project in some ways, less work in others, but overall I think it has more maintenance benefits. Importantly, it keeps any new issues or volatility as far away from the C library as possible, making it easy to see if a change could possibly break the ABI or the C library at all, for example.

This scheme may be extended to other languages if that's appropriate. The naming scheme for Python is like python-dostuff. It probably makes more sense to maintain Cython wrappers as separate projects maintained in the Python way (sigh...), but the whole point of a naming scheme is to have space for things in case you need them. In reality, language bindings are usually done independently by other people in separate projects (Rust folks will use Cargo in a separate repository, and so on).

All of this is, obviously, a massively over-thought bikeshed, but adding multiple programming languages and multiple versioning and compatibility schemes/philosophies to a project is a bit tricky. I can't just copy from an existing best-practice pattern I've been honing for years like I can with straight C libraries. This approach seems like it shouldn't cause too much trouble, though.

That said, I'm just making this up as I go along and have no experience maintaining anything quite like this (only more or less homogeneous C or C++ libraries), so feedback is, as always, welcome. I may revise this post if anything turns out to be a mistake, so it can ultimately serve as a reference for the next person trying to figure out how to do "C family" source code releases right.

Beautiful C and C++ Documentation with Sphinx

Like many, I've long suffered under the antiquated and inflexible HTML documentation generated by Doxygen. Having recently worked on some Python documentation using Sphinx, though, I found it powerful and pleasant enough to use. It also has a way of encouraging actually writing documentation, rather than just generating a dump of glorified comments, which is a good thing. Though I'm not at all a fan of ReStructuredText syntax (which at times seems like it's trying to be cryptic on purpose), Sphinx is undeniably powerful, and I like the "assemble a bunch of plainish text files" approach in general. The support for multiple languages is also very appealing, though not without its problems, as we'll get to.

So, is it possible to use Sphinx to generate documentation for C and C++ libraries? Yes! As explained somewhat recently in a post by Sy Brand, there is a project called Breathe that integrates Doxygen (for extracting documentation) with Sphinx (for generating output). That sounded promising, so I attempted to migrate a library to using Breathe instead of Doxygen's HTML support. Unfortunately, though, I encountered quite a few roadblocks where I couldn't quite get output that I was happy with. Worse, the project itself is very complicated, and as I poked around in swaths of originally generated but manually modified code, I decided that Breathe was not for me. That would feel like just exchanging one inflexible and unhackable system for another.

What, then, to do? Though I realize that deep integration via modules like Breathe is usually the way things are done with Sphinx, I am a KISS sort of person, so I like to think of it as something more like a Static Site Generator: it reads a bunch of plainish text input files, and outputs HTML (or whatever other presentation format). How do we describe C and C++ things in Sphinx? It turns out that recent versions have built-in support for these "domains" now, which define markup for describing everything in these languages. This means that everything to do with nicely formatting and cross-referencing C and C++ is already dealt with out of the box. Excellent.

So, taking a step back and assessing the situation: we have some XML files that describe the documentation, and we have a tool that reads text files and produces nice documentation. This strikes me as a relatively straightforward task for a nice and simple "files in, files out" script, not somewhere a Goldbergian contraption that mashes Doxygen into Sphinx is required. So, after investigating any other promising options (no such luck), I resigned myself to trying to write such a thing, at the very least to see if it's feasible. I certainly have no time or interest in writing and maintaining a Documentation System, but a self-contained script to convert one thing to another seems reasonable enough.

As it turns out, I wouldn't call it trivial, but it's certainly feasible. I ended up with a ~700 line Python script that does everything I need (though this is of course not the same as everything possible). It's a bit "gluey" and makes some assumptions about the structure and so on, but it does the job and is something I feel I can maintain as necessary. I won't be publishing or supporting this as an independent project any time soon, and make no claims about it being general purpose, but feel free to steal it if any of this sounds appealing.

With this, I was able to get around some long-standing gripes I have with Doxygen, and easily make whatever I wanted to happen a reality, so I'm pretty happy with this approach. Everything is nicely decoupled, so I don't feel over-invested in any of the tools involved. If, for example, someone finally writes a good clang-based extractor that gains traction (JSON please, I did not enjoy this revisitation of the horrors of XML at all), I should be able to switch to using that easily enough. I've actually found this somewhat crude and UNIXey approach quite convenient: you can simply look at the ReST files to understand what is happening, or tweak them a bit and run Sphinx to test what you're aiming for, and so on. Text files are good.

So, after however many years, I think I've found an approach to documentation I'm actually quite happy with, that can support all of the languages that I use, and in general doesn't seem to get in my way. Hooray. For starters, I did my window system portability layer, Pugl. The generated documentation for the C API can be seen at https://lv2.gitlab.io/pugl/c/singlehtml/, and the C++ at https://lv2.gitlab.io/pugl/cpp/singlehtml/. This is more or less the standard Alabaster theme with a few tweaks, which I'm not sure feels appropriate for API documentation (and is much more bloated with a bunch of Javascript than I'd like), but it's pretty enough, at least. I'll tinker with themes later when I feel like jumping down that rabbit hole.

The slightly cumbersome links are an artifact of the one problem I encountered using Sphinx domains: you can't really document C and C++ APIs nicely in the same documentation set. If you use the cpp domain everywhere, you get name mangling in links even for C symbols, which is really unfortunate, and you can't really mix them. To take a contrived example, if you have a struct MylibThing in C, then a type alias in C++ like using Thing = MylibThing, Sphinx isn't clever enough to figure out that MylibThing is from C, and will generate warnings and not link correctly. Perhaps someday it will, which would be nice, but for now I opted to simply generate completely separate documentation sets. This means the C documentation is duplicated in the C++ documentation so that things can be hyperlinked, which isn't ideal, but I can live with it. A certain amount of redundancy is inherent in multi-language documentation anyway.

As I add Python bindings to most libraries, having a unified documentation system for all of these languages will be very nice. There is one additional thing I'll need at some point for the LV2 documentation in particular: a domain for RDF properties and classes. The LV2 documentation really suffers from an unnatural code (via Doxygen) and data (via lv2specgen) documentation split, and my hope is that Sphinx can provide a nice environment for writing documentation that refers to both worlds freely. That, unfortunately, will be much more work, but hopefully writing a custom Sphinx domain isn't too hard...

Page 1 / 1