How to define a datatype in RDF

I had to do some digging around to figure out how to define a new Datatype with restrictions in RDF, so I thought it might make a useful post to save someone else the trouble in the future.

RDF datatypes are based on XSD datatypes, which are often used directly. Unfortunately, most implementations simply have the XSD types baked in and do not support or validate new datatype descriptions (though at least sord_validate can). Regardless, it is sometimes necessary to define a datatype with a specific restriction so it can be machine validated. It's a bit tricky to figure out how to do this, since everything is buried in specifications that aren't as triple oriented as they should be. So, here is an example of defining a datatype restricted by regular expression in Turtle, derived from the OWL documentation:

    a rdfs:Datatype ;
    rdfs:comment "A symbol in the C programming language" ;
    owl:onDatatype xsd:string ;
    owl:withRestrictions (
            xsd:pattern "[_a-zA-Z][_a-zA-Z0-9]*"
        ) .

The XSD specification defines several “constraining facets” you can use in this way. See the XSD specification for details, but the most obvious and useful for RDF are: xsd:length, xsd:minLength, xsd:maxLength, xsd:pattern, xsd:maxInclusive, xsd:maxExclusive, xsd:minInclusive, xsd:minExclusive. For example, you can define a numeric type with restricted range like so:

    a rdfs:Datatype ;
    rdfs:comment "An integer between 24 and 42 inclusive" ;
    owl:onDatatype xsd:integer ;
    owl:withRestrictions (
            xsd:minInclusive 24
        ] [
            xsd:maxInclusive 42
    ) .

Defining datatypes in this way and using them as the rdfs:range for properties is a good idea because it describes which values are valid in a machine readable way. This makes it possible for simple generic tools to validate data, ensuring that all literals are valid values for the property they describe.

LV2 1.0.0

The first unified LV2 release, LV2 1.0.0, is out.

This release merges the previous lv2core package with all the official extension packages, as well as example plugins, lv2specgen, and additional data. From a developer point of view, the biggest change is that all LV2 API headers can be used by simply checking for the single pkg-config package "lv2" (for compatibility the previous "lv2core" package is still installed). Implementations are encouraged to abandon the "copy paste headers" practice and depend on this package instead.

With this release, several new extensions have become stable that together greatly increase the power of LV2: atom, log, parameters, patch, port-groups, port-props, resize-port, state, time, worker.

Documentation and more detailed change logs

Download LV2 1.0.0

LV2 atom and state serialisation

I have been working on full round-trip serialisation for everything in the LV2 Atom extension (which also applies for implementing state). I am doing this as a small library with a simple API, with the goal that it be simple to include in any project.

svn co

Currently this only writes (using Serd), I still need to move the reading stuff into it (which will probably use a model and thus require using Sord).

The Atom extension defines a simple data model for use in LV2 plugins and hosts (or elsewhere). The general Big Idea here is to have a smallish set of primitive types, a few collections, and out of that one can build arbitrarily complex structures. Everything (including containers) is a simple and compact chunk of POD data, but serialises to/from (a subset of) RDF, so it can nicely be described in a plugin's Turtle file, among other advantages.

An easy to adopt serialisation API is important to making these advantages a reality for many implementations, so I have decided to provide one before marking these extensions stable. It also serves as a nice test case with complete coverage. Here is an example of an Atom that contains every currently defined Atom type, as well as MIDI, serialised to Turtle by sratom:

    rdf:value [
        a eg:Object ;
        eg:one "1"^^xsd:int ;
        eg:two "2"^^xsd:long ;
        eg:three "3.0"^^xsd:float ;
        eg:four "4.0"^^xsd:double ;
        eg:true true ;
        eg:false false ;
        eg:path </foo/bar> ;
        eg:uri eg:value ;
        eg:urid eg:value ;
        eg:string "hello" ;
        eg:langlit "bonjour"@fra ;
        eg:typelit "value"^^eg:Type ;
        eg:blank [
            a eg:Object ;
        ] ;
        eg:tuple [
            a atom:Tuple ;
            rdf:value (
            ) ;
        ] ;
        eg:vector [
            a atom:Vector ;
            rdf:value (
            ) ;
        ] ;
        eg:seq [
            a atom:Sequence ;
            rdf:value (
                    atom:frameTime 1 ;
                    rdf:value "901A01"^^midi:MidiEvent ;
                ] [
                    atom:frameTime 3 ;
                    rdf:value "902B02"^^midi:MidiEvent ;
            ) ;
        ] ;
    ] .

I anticipate/intend for all plugin control to happen via such messages, since this approach has a few important qualities:

  1. Typically no need to define new binary formats for things (and be held back waiting for others to implement them).
  2. Everything has a portable serialization for free (meaning network transparency, saving/loading, and for developers or power users the ability to dump any message to see what is going on).
  3. The convention is to use "objects" (resources, i.e. things with properties) as messages, which are inherently extensible. No "oops I needed to add a parameter so now compatibility is broken".
  4. Easy to bind to other languages or syntaxes, so e.g. Python or browser-based UI frameworks should be possible.
  5. Any RDF vocabulary can be used, meaning millions of well-defined and documented predicates ("keys") are available right now (though it is perfectly okay to create one-off objects - compatibility with RDF is a benefit, not a burden).

The atom extension includes an API that makes it relatively simple to build such objects in C, so plugins can write them directly to an output port or a ring buffer. See the "forge" API in the Atom extension for details. There are also iterators for all the collections and a "get" function for objects to make reading data simple.

Just in case it's not crystal clear, the above is only the external representation of the corresponding atom. At run-time, an atom (i.e. what plugins work with) is just a blob of data with an integer type and size header. 100% of the API provided for reading and writing atoms is real-time safe and suitable for use in an audio processing thread.

For an example, see the LV2 sampler example, which has a UI that loads samples via such messages. It currently works in Jalv, Ardour support is coming soon. This is the way forward for more powerful LV2 plugin control, and hopefully will end the worrying practice of abusing direct instance access to provide such functionality.

This work isn't finished yet, but the important parts are done and not likely to change significantly. I am interested in hearing any developer feedback, feel free to comment on this post or at the LV2 mailing list.

New LV2 host implementation stack

I have released my new stack of libraries for implementing LV2 in hosts:

  • Serd, a fast, lightweight, dependency-free Turtle syntax library
  • Sord, an in-memory RDF quad store
  • Lilv, an LV2 host library (replaces SLV2)
  • Suil, an LV2 UI loading/embedding library

These libraries collectively replace SLV2, and have no dependencies except amongst themselves, and Glib (by Lilv and Sord, but this dependency will likely be removed in the near future). Serd and Sord replace Redland, making for a dramatically smaller implementation more appropriate for audio software and embedded applications.

Overall, Lilv is dramatically faster and leaner than SLV2, enough that the improvement should be quite noticeable from a user point of view (typically in a lag when the host first loads all LV2 data). Anyone using SLV2 is highly encouraged to migrate to Lilv.

These libraries are well tested, each (except Suil) with a test suite covering over 90% of the code, which runs without memory errors or leaks. They are new, however, so (as always) there may be problems; feedback is most welcome.

LV2 Extension Documentation

Recently there has been a lot of discussion on the linux-audio-dev list about LV2, and perceived problems thereof. Much of this was nonsense and FUD, but there is a problem with documentation of LV2 things in general, and the general public face of the LV2 site and wiki (i.e. it looks really bad and useful information is scattered everywhere, in inconsistent formats, and just generally hard to find or use, if it exists at all).

Towards solving this problem, I've done a lot of work on automatic documentation generation for LV2 extensions, and establishing good conventions for extension creators to follow in the future. The fruits of this effort can be seen by plugging the URI for any concept in an extension hosted at directly into your browser's location bar.

For example, if you encounter event:inheritsTimeStamp in an LV2 data file somewhere and don't know what it means, head to (you will have to look up what the prefix "event" stands for in the file, of course). You are taken directly to documentation on that property generated from the extension itself (as defined in a .ttl file).

The same applies to entire extensions: the event extension's URI is Heading to that URI in your browser will take you to the documentation of that extension. This is the URI used directly in data files that refer to the extension, so no searching whatsoever is required to get at the information you need. As you can see on that page, there are links to all the related files you might need (e.g. headers, a tarball of the extension, etc).

Extension URIs will also content negotiate to return the requested type of data. Try these commands in your shell:

wget -q --header "Accept: application/x-turtle" -O -
wget -q --header "Accept: application/rdf+xml" -O -
wget -q --header "Accept: application/json" -O -
wget -q --header "Accept: text/plain" -O -
wget -q --header "Accept: text/html" -O -

This could be used by LV2 hosts to automatically fetch from the web information they can not find locally. For example, if an extension is required by a plugin but a host doesn't know what it is, the host can trivially fetch that extension from the web and display a human readable description of what that extension does. This is the kind of thing that using good technology like RDF allows, you naysayers out there ;)

This is all a bit rough around the edges still, and the extensions hosted here aren't all documented well yet (and of course, not all extensions are hosted here), but I'd say this is a pretty big improvement. If you base your own extension on the form of these ones (which is easy), you get all this for free. Hopefully soon enough most if not all extensions and plugins will work this way, at which point I'd say that the reference documentation problem of LV2 is solved about as well as it possibly could be. Better than any other technology I have seen anyway (how many things do you use where you can just plug whatever you're curious about right into your browser and get taken directly to the documentation for that thing?).

Of course, more higher level "tutorial" type documentation is still needed, but this is a separate problem. As far as reference documentation goes, here is the chicken. Go forth and create eggs.

Indices to the currently hosted extensions: /ext /extensions /dev

Stay tuned for more tools to come, e.g. a similar documentation generator for plugins, a plugin/extension validator, etc.

GStreamer SoC Midterm Summary

Apologies for not blogging along the way for those who are interested in this project. I'm more into actually doing things rather than blogging about doing things ;)

So, LV2 in Gst, where are we? Well, I've had LV2 plugins working with the same functionality as LADSPA plugins for a while. The main downside was shared by LADSPA: no support for multi-channel streams.

Some background for those unfamiliar with LADSPA or LV2: LADSPA and LV2 plugins are very simple in terms of I/O. A plugin has a collection of "ports" which contain either a single floating point value (control) or an array of floating point values (audio). LV2 extends this to any types but that's not important here. This is simple but causes problems in the face of stereo and other multi-channel streams: if a plugin has 3 audio inputs and 2 audio outputs, for example, what is what? Some kind of 3-channel audio input with stereo output? 5 completely unrelated ports? A stereo input and stereo output with a "sidechain" (common with e.g. compressors)? This information is not available in LADSPA, the best you can do is guess (which is practice means it just doesn't work).

GStreamer works with multi-channel streams as a single interleaved stream, so this is a problem. Thankfully, LV2 makes it trivial to add whatever information you like about plugins without having to touch a line of code since plugins are described in RDF (see the LV2 site for more information). All that needs to be decided is how to actually model that information. A specification for this is called an "ontology" in general/theory, and an "extension" in the LV2 community.

So, the problem is we need(ed) a good multi-channel ontology for LV2 plugins to work well inside Gst, since most things in Gst are at least stereo. The difficult thing with creating ontologies is making sure anything anyone might want to describe in the future is accounted for. Here's my best shot at this so far: LV2 Port Groups, based largely on earlier work by Lars Luthman and some input from the LV2 community. This extension isn't final, but expresses all the information needed by Gst for multi-channel (and more). (I also wrote the documentation generator that created the aforementioned specification page in the hopes that more user friendly documentation will encourage adoption by plugin and host authors).

I've created patches for the popular SWH and Calf plugin packages to add this information. When the extension goes final they will be included in these projects, but in the mean time are included in my git branch of gst-plugins-bad (see

This is, as far as I know, the first time coherent multi-channel information has been available about plugins from the "LAD" community (e.g. LADSPA, DSSI, LV2). Coincidentally, this information is required for recent work on Ardour, among other things. In hindsight, this was a pretty glaring hole in the general state of LAD plugins, but back to GStreamer...

I've rewritten quite a bit of the GstSignalProcessor class (used by the LADSPA and now LV2 wrapper elements) to support multi-channel plugins. In code terms this means creating a set of buffers for non-interleaved data used by the plugin, and interleaving/deinterleaving buffers to/from Gst, respectively.

In summary: many LV2 plugins now exist with useful multi-channel information, and stereo LV2 plugins now work correctly in GStreamer. It should now be simple to add support for other audio plugin interfaces (VST? AudioUnits?) that also works with multi-channel streams.

Next up:

  • Get the "role" information from the plugin data and use it to support surround streams correctly
  • Finalize and publish the LV2 Port Groups extension, and contribute patches for all major plugin collections

Where to go from there is pretty open-ended. Unlike LADSPA, LV2 can theoretically support any kind of data, or any feature (non-realtime and non-audio things included). Extensions just need to be made to bridge the gap. What sort of functionality would you like to see bridge the GStreamer/LV2 gap?


There's been a bit of talk in the GNOME camp lately about using DOAP instead of the unstructured text files that are the current norm for source packages. On the one hand, people want the benefits of having machine readable data in projects, OTOH, RDF/XML is a nightmare ("I'll never maintain such bloat!" - "That is one hell of an ugly file.").

This is how RDF/XML hurts RDF. The original loathed file:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"?>
<rdf:RDF xml:lang="en"
  <Project rdf:about="">
    <license rdf:resource="" />
    <name>Apache Ant</name>
    <homepage rdf:resource="" />
    <asfext:pmc rdf:resource="" />
    <shortdesc>Java-based build tool</shortdesc>
    <description>Apache Ant is a Java-based build tool. In theory, it is kind of like Make, but without Make\'s wrinkles.</description>
    <bug-database rdf:resource="" />
    <mailing-list rdf:resource="" />
    <download-page rdf:resource="" />
    <category rdf:resource="" />
        <name>Apache Ant 1.7.0</name>
        <location rdf:resource=""/>
        <browse rdf:resource=""/>

and the equivalent in Turtle (a subset of N3) (automatically generated with rapper -o turtle doap_Ant.rdf):

@prefix rdf: <> .
@prefix : <> .
@prefix asfext: <> .

    a :Project;
    :created "2006-02-17"@en;
    :license ;
    :name "Apache Ant"@en;
    :homepage <>;
    asfext:pmc <>;
    :shortdesc "Java-based build tool"@en;
    :description "Apache Ant is a Java-based build tool. In theory, it is kind of like Make, but without Make's wrinkles."@en;
    :bug-database <>;
    :mailing-list <>;
    :download-page <>;
    :programming-language "Java"@en;
    :category <>;
    :release [
        a :Version;
        :name "Apache Ant 1.7.0"@en;
        :created "2006-12-13"@en;
        :revision "1.7.0"@en
    :repository [
        a :SVNRepository;
        :location <>
        :browse <>
    ] .

I wouldn't want to hand maintain for RDF/XML version either, but the Turtle version? Sure. It's the exact same information, far more human readable, and about as terse as it could be while representing the same information.

The best thing about a syntax independent model like RDF is.. well, it's syntax independent. Choose one that doesn't suck :)

Page 1 / 1