Serd

Serd is a lightweight C library for RDF syntax which supports reading and writing Turtle and NTriples.

Serd is not intended to be a swiss-army knife of RDF syntax, but rather is suited to resource limited or performance critical applications (e.g. converting many gigabytes of NTriples to Turtle), or situations where a simple reader/writer with minimal dependencies is ideal (e.g. in LV2 implementations or embedded applications).

Features

  • Free: Serd is Free Software released under an extremely liberal license which allows use by both open and proprietary projects.

  • Small: Serd is implemented in around 3000 lines1 of standard C code. On a typical machine it compiles to about 90 KiB, but can be as small as 29 KiB when optimized for size. For comparison, on the same system raptor is 417 KiB and libxml2 is 2.1 MiB (not including dependencies), making serd roughly 5 and 25 times smaller, respectively.

  • Portable and Dependency Free: Serd uses only the C standard library, and has no external dependencies. It is known to compile with GCC, LLVM/CLang, and MSVC (as C++), and is tested on GNU/Linux, OpenBSD, Mac OS X, and Windows.

  • Fast and Lightweight: Serd (and the included serdi tool) can be used to stream abbreviated Turtle, unlike many tools which must build an internal model to abbreviate. In other words, Serd can serialise an unbounded amount of abbreviated Turtle using a fixed amount of memory, and it does so very quickly: to the author’s knowledge, serd is the fastest Turtle reader/writer by a wide margin (see Performance below).

  • Conformant and Well-Tested: Serd is written to the Turtle, NTriples and URI specifications, and includes a comprehensive test suite which includes all the tests from the Turtle specification, all the “normal” examples from the URI specification, and several additional tests added specifically for Serd. The test suite has 100% code coverage (by line), and runs with zero memory errors or leaks2.

Performance

Serdi Throughput (NTriples to Turtle)
Command Memory HDD Time HDD Throughput SSD Time SSD Throughput
serdi -b -f -i ntriples -o turtle input.nt 3.22 MiB 0:35 64.2 MiB/s 0:27 83.2 MiB/s
grep 'dbpedia' input.nt 3.6 MiB 0:35 64.2 MiB/s 0:21 107.0 MiB/s
sed 's/dbpedia/example/' input.nt 3.7 MiB 0:37 60.7 MiB/s 0:22 102.1 MiB/s
rapper -i ntriples -o turtle input.nt 11124.2 MiB 3:02 12.3 MiB/s 3:03 12.3 MiB/s
rapper -i ntriples -o ntriples input.nt 10.6 MiB 1:26 26.1 MiB/s 1:08 33.0 MiB/s

Input is mappingbased_properties_en.nt from DBPedia fetched on 2011-12-12, ~17.5M triples, 2247 MiB uncompressed. System is a Debian GNU/Linux machine with Linux 3.1.1 on an Intel Core i7-2620M. “Memory” is maximum resident set (the maximum total memory use). “Time” is wall clock time. For reliable benchmarking, the file system cache was flushed before each run with echo 3 > /proc/sys/vm/drop_caches. The variance between identical runs was less than 2 MiB/s, the best of 3 is shown. Output was redirected to a file on the same disk. Measurements by /usr/bin/time -v. Note the last raptor entry is performing a simpler (non-abbreviating) task, included here for comparison.

These results show that Serdi is capable of converting NTriples to abbreviated Turtle using a small constant amount of memory. Serd is fast enough that the process is entirely I/O bound when reading from the hard disk. The solid state drive is fast enough that Serd can’t quite maintain maximum throughput, so grep and sed are faster as expected because these tools do much less processing.

Download

The latest version of Serd is 0.20.0, released on August 08, 2014.

Documentation

Man pages and HTML documentation are built and installed by the source distribution when configured with --docs.

Support

Serd is developed and given away freely for the benefit of all. However, donations of appreciation for the considerable time and effort spent are appreciated:

Development

Notes

  1. Stripped of comments etc., as calculated by David A. Wheeler’s SLOCCount
  2. According to valgrind

15 thoughts on “Serd

  1. Pingback: drobilla :: New LV2 host implementation stack

  2. Pingback: drobilla :: Serd 0.5.0

  3. Pingback: LV2 atom and state serialisation | drobilla.net

  4. Pingback: Serd 0.14.0 | drobilla.net

  5. Pingback: Sratom 0.2.0 | drobilla.net

  6. Pingback: Serd 0.18.0 | drobilla.net

  7. Pingback: Serd 0.18.2 | drobilla.net

  8. Hello!

    Thanks for a magnificent piece of code! It’s a pleasure to read it and works like a charm :)

    I was wondering if you know some way of reading nt.gz and ttl.gz files directly from C by linking to libz. Having a look at the code, maybe reimplementing serd_open() to opena, and page() that calls read()?

    Thanks!

    • Thanks! I actually think I was a little too focused on brevity and not enough on clarity when I wrote it, but it’s nice to have somebody notice :)

      Hmm, interesting idea. Possibly very useful given Serd’s suitability for massive dumps. Indeed, the file stuff is already mostly abstracted away, so this seems quite feasible. The abstraction would just have to be extended a bit to wrap everything including read, but this would be nice anyway particularly since I suspect that open() and friends might be faster than fopen() and friends, particularly since Serd pages on its own and the libc buffering is just bloat.

      On the other hand you could just pipe the file through serdi if command line usage is the goal. I am not sure how much faster custom support for zlib would actually be, and I would like to keep serd dependency free…

      The way to go about this would be to make a SerdSource, essentially the same idea as SerdSink, probably also internal to avoid any overhead, though it would be nice to expose in the API so apps can use whatever sources as well.

  9. Thanks for the reply!

    Exactly, I was also thinking of big dumps. From the terminal you can always use pipes with serdi, but from a C app it is more difficult. Well, actually you can create the pipe using:
    file = popen(“gunzip -c file.nt.gz”, “r”);
    The advantage is that you can use any decompressor available in the system, and it will do the hard work in a different thread. The disadvantage is that it is not so portable (i.e. windows).

    I think that exposing a SerdSource is a great balance, because you allow the developers to do all these tricks without adding dependencies :)

  10. Pingback: Yertle: an RDF/Turtle library in Yeti | The Breakfast Post

  11. Hi, Thanks for the wonderful tool.
    I was trying to use it to convert the freebase.org (https://developers.google.com/freebase/data) dump that is in turtle RDF format to ntriples but the program do nothing. The command I’ve used is serdi -b -f -i turtle -o ntriples freebase.ttl
    Is there something wrong in the command I’ve used? I’ve tried also different options but still with no luck.

    Thanks

  12. Pingback: ‘Big Data’ – The Next Frontier - Graduate Admissions

  13. Pingback: Serd 0.20.0 | drobilla.net

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>