C++ Bindings

Posted on 2023-01-22 23:46 in Software, C, Cpp, LAD

For some C libraries, I'd like to include "official" C++ bindings to make life easier for people using them from C++ (which in the audio world, is most). However, that's not something I know a good pattern for, in terms of project organization, installation, versioning, and so on. Figuring one out is a trickier problem than it may seem at first.

In the - in this case literally - "C and C++" world, there is a notorious lack of consistent conventions and best practices in some areas, and this seems to be one of them. So, I suppose I will have to suss out the "best" (and least weird) way myself. The "best" way should:

Provide "official" C++ bindings which are developed, maintained, and shipped with the underlying C library.
Avoid having the C++ wrapper be locked to the same version as the C library (which is a strict semver reflection of the ABI).

Rationale: It must be possible to develop the C++ bindings, including make breaking changes, while the C library version (and therefore the ABI) stays the same. Otherwise, it would be nearly impossible to change them, because that'd require changing the version... but the underlying C API version needs to break as infrequently as possible.
Isolate the bindings (and "C++ stuff") from the underlying C library as much as possible. Ensure that builds on systems without a C++ toolchain work (this isn't uncommon on minimal or embedded systems which much of this software is appropriate for use on).
Avoid making a completely separate new project (repository, test and releasing infrastructure, and so on) if at all possible. The maintenance burden would be far too high, and the bindings would be prone to rot.
Use a simple and predictable naming scheme that works with any "main" project name.

Poking around repositories and tinkering a little bit, the best practices I can come up with (for the sort of libraries I'm thinking about anyway), is:

Develop and release bindings as a sub-project within the "main" project.

This is only a "project" in the build system sense. The bindings are maintained in the same git repository, and released in the same archive, as the C library.
Name the bindings sub-project by appending a cpp suffix, for example, mylib-cpp. This scheme is... well, not uncommon (for example, in the Debian repositories), and can easily be applied to any name, including libraries that already have multi-word names.

Following meson requirements, this means the sub-project lives at a path like subprojects/mylib-cpp in the repository.
Install a separate "package" (for example via pkg-config) for the bindings, which depends on the one for the underlying C library. The major version is appended to both, for example, mylib-cpp-1 might depend on mylib-1.
Keep the C++ bindings themselves as light as possible, and header-only. This avoids link-time issues, making C++ API compatibility a compile-time issue only.
Give the bindings package a separate version number and let it increase as necessary. This version is not aligned with that of the underlying C library in any meaningful way. Technically, a given version of the bindings depends on some version of the C library, but in practice, this is always simply the version it's shipped with.

A strange consequence of this scheme is that the version of the C++ bindings can only drift ever further away, so in the future even major versions may not correspond at all. This is a bit weird, but is the only way to make everything work and be properly versioned. Effectively, the version of the bindings is just an implementation detail, something developers deal with in configuration scripts. From the perspective of packagers or users, there is just one version of the library, the version of the underlying C library - the C++ bindings just may break sometimes, even within a major version of the project as a whole.

I can't think of any concrete reason why this could be a problem: the urge to have shiny "4.0.0" type version bumps across everything at the same time smells like... marketing, frankly, not engineering. It does make parallel installation of different major versions more difficult, though. Packagers can split up the installation and make separate packages if they really want to. "Upstream" (me) officially doesn't care about parallel installation of different major versions of the C++ bindings.

All that said, ideally they happen to stay relatively aligned anyway.
Make sure there is a simple and obvious option to disable C++ entirely, leaving a C library package with the broadest compatibility possible.

The short, vibes-based description of all that is something like: there is a stable and strictly versioned C library with every effort put into long-term source and binary compatibility, as always... and then there's a C++ bindings sub-project that tags along with it but is otherwise independent. The bindings are more volatile, but it's C++, so they're going to be volatile no matter what you do anyway. The bindings project is universally named by tacking a -cpp or _cpp on the end as appropriate in every context: include directories, package names, and so on.

So, an installation might look something like this:

include/dostuff-1/dostuff.h
include/dostuff-cpp-4/dostuff.hpp
lib/libdostuff-1.so
lib/libdostuff-1.so.1
lib/libdostuff-1.so.1.2.4
lib/pkgconfig/dostuff-1.pc
lib/pkgconfig/dostuff-cpp-4.pc

In the source code, the bindings and any supporting C++ code is entirely contained within the subproject, except for a minimal skeleton to handle compile time options and so on. This can be more work than a single heterogeneous project in some ways, less work in others, but overall I think it has more maintenance benefits. Importantly, it keeps any new issues or volatility as far away from the C library as possible, making it easy to see if a change could possibly break the ABI or the C library at all, for example.

This scheme may be extended to other languages if that's appropriate. The naming scheme for Python is like python-dostuff. It probably makes more sense to maintain Cython wrappers as separate projects maintained in the Python way (sigh...), but the whole point of a naming scheme is to have space for things in case you need them. In reality, language bindings are usually done independently by other people in separate projects (Rust folks will use Cargo in a separate repository, and so on).

All of this is, obviously, a massively over-thought bikeshed, but adding multiple programming languages and multiple versioning and compatibility schemes/philosophies to a project is a bit tricky. I can't just copy from an existing best-practice pattern I've been honing for years like I can with straight C libraries. This approach seems like it shouldn't cause too much trouble, though.

That said, I'm just making this up as I go along and have no experience maintaining anything quite like this (only more or less homogeneous C or C++ libraries), so feedback is, as always, welcome. I may revise this post if anything turns out to be a mistake, so it can ultimately serve as a reference for the next person trying to figure out how to do "C family" source code releases right.