Serd

Serd is a lightweight C library for reading and writing RDF in Turtle, NTriples, NQuads, and TriG.

Overview

The API revolves around two main types: the Reader, which reads text and fires callbacks, and the Writer, which writes text when driven by corresponding functions. Both work in a streaming fashion but still support pretty-printing, so the pair can be used to pretty-print, translate, or otherwise process arbitrarily large documents very quickly. The context of a stream is tracked by the Environment, which stores the current base URI and set of namespace prefixes.

The complete API is declared in serd.h:

#include <serd/serd.h>

Serd C API

Status Codes

enum SerdStatus

Return status code.

enumerator SERD_SUCCESS

No error.

enumerator SERD_FAILURE

Non-fatal failure.

enumerator SERD_ERR_UNKNOWN

Unknown error.

enumerator SERD_ERR_BAD_SYNTAX

Invalid syntax.

enumerator SERD_ERR_BAD_ARG

Invalid argument.

enumerator SERD_ERR_NOT_FOUND

Not found.

enumerator SERD_ERR_ID_CLASH

Encountered clashing blank node IDs.

enumerator SERD_ERR_BAD_CURIE

Invalid CURIE (e.g. prefix does not exist)

enumerator SERD_ERR_INTERNAL

Unexpected internal error (should not happen)

const uint8_t *serd_strerror(SerdStatus status)

Return a string describing a status code.

String Utilities

size_t serd_strlen(const uint8_t *str, size_t *n_bytes, SerdNodeFlags *flags)

Measure a UTF-8 string.

Returns

Length of str in characters (except NULL).

Parameters
  • str – A null-terminated UTF-8 string.

  • n_bytes – (Output) Set to the size of str in bytes (except NULL).

  • flags – (Output) Set to the applicable flags.

double serd_strtod(const char *str, char **endptr)

Parse a string to a double.

The API of this function is identical to the standard C strtod function, except this function is locale-independent and always matches the lexical format used in the Turtle grammar (the decimal point is always “.”).

void *serd_base64_decode(const uint8_t *str, size_t len, size_t *size)

Decode a base64 string.

This function can be used to deserialise a blob node created with serd_node_new_blob().

Parameters
  • str – Base64 string to decode.

  • len – The length of str.

  • size – Set to the size of the returned blob in bytes.

Returns

A newly allocated blob which must be freed with serd_free().

Byte Streams

typedef int (*SerdStreamErrorFunc)(void *stream)

Function to detect I/O stream errors.

Identical semantics to ferror.

Returns

Non-zero if stream has encountered an error.

typedef size_t (*SerdSource)(void *buf, size_t size, size_t nmemb, void *stream)

Source function for raw string input.

Identical semantics to fread, but may set errno for more informative error reporting than supported by SerdStreamErrorFunc.

Param buf

Output buffer.

Param size

Size of a single element of data in bytes (always 1).

Param nmemb

Number of elements to read.

Param stream

Stream to read from (FILE* for fread).

Returns

Number of elements (bytes) read.

typedef size_t (*SerdSink)(const void *buf, size_t len, void *stream)

Sink function for raw string output.

URI

struct SerdURI

A parsed URI.

This struct directly refers to chunks in other strings, it does not own any memory itself. Thus, URIs can be parsed and/or resolved against a base URI in-place without allocating memory.

SerdChunk scheme

Scheme.

SerdChunk authority

Authority.

SerdChunk path_base

Path prefix if relative.

SerdChunk path

Path suffix.

SerdChunk query

Query.

SerdChunk fragment

Fragment.

const SerdURI SERD_URI_NULL
uint8_t *serd_file_uri_parse(const uint8_t *uri, uint8_t **hostname)

Get the unescaped path and hostname from a file URI.

The returned path and *hostname must be freed with serd_free().

Parameters
  • uri – A file URI.

  • hostname – If non-NULL, set to the hostname, if present.

Returns

The path component of the URI.

bool serd_uri_string_has_scheme(const uint8_t *utf8)

Return true iff utf8 starts with a valid URI scheme.

SerdStatus serd_uri_parse(const uint8_t *utf8, SerdURI *out)

Parse utf8, writing result to out

void serd_uri_resolve(const SerdURI *r, const SerdURI *base, SerdURI *t)

Set target t to reference r resolved against base.

RFC3986 5.2.2

size_t serd_uri_serialise(const SerdURI *uri, SerdSink sink, void *stream)

Serialise uri with a series of calls to sink

size_t serd_uri_serialise_relative(const SerdURI *uri, const SerdURI *base, const SerdURI *root, SerdSink sink, void *stream)

Serialise uri relative to base with a series of calls to sink

The uri is written as a relative URI iff if it a child of base and root. The optional root parameter must be a prefix of base and can be used keep up-references (“../”) within a certain namespace.

Node

struct SerdNode

A syntactic RDF node.

const uint8_t *buf

Value string.

size_t n_bytes

Size in bytes (excluding null)

size_t n_chars

String length (excluding null)

SerdNodeFlags flags

Node flags (string properties)

SerdType type

Node type.

enum SerdType

Type of a node.

An RDF node, in the abstract sense, can be either a resource, literal, or a blank. This type is more precise, because syntactically there are two ways to refer to a resource (by URI or CURIE).

There are also two ways to refer to a blank node in syntax (by ID or anonymously), but this is handled by statement flags rather than distinct node types.

enumerator SERD_NOTHING

The type of a nonexistent node. This type is useful as a sentinel, but is never emitted by the reader.

enumerator SERD_LITERAL

Literal value. A literal optionally has either a language, or a datatype (not both).

enumerator SERD_URI

URI (absolute or relative). Value is an unquoted URI string, which is either a relative reference with respect to the current base URI (e.g. “foo/bar”), or an absolute URI (e.g. “http://example.org/foo”). RFC3986

enumerator SERD_CURIE

CURIE, a shortened URI. Value is an unquoted CURIE string relative to the current environment, e.g. “rdf:type”. CURIE Syntax 1.0

enumerator SERD_BLANK

A blank node. Value is a blank node ID without any syntactic prefix, like “id3”, which is meaningful only within this serialisation. RDF 1.1 Turtle

const SerdNode SERD_NODE_NULL
SerdNode serd_node_from_string(SerdType type, const uint8_t *str)

Make a (shallow) node from str.

This measures, but does not copy, str. No memory is allocated.

SerdNode serd_node_from_substring(SerdType type, const uint8_t *str, size_t len)

Make a (shallow) node from a prefix of str.

This measures, but does not copy, str. No memory is allocated. Note that the returned node may not be null terminated.

SerdNode serd_node_new_uri_from_node(const SerdNode *uri_node, const SerdURI *base, SerdURI *out)

Simple wrapper for serd_node_new_uri() to resolve a URI node.

SerdNode serd_node_new_uri_from_string(const uint8_t *str, const SerdURI *base, SerdURI *out)

Simple wrapper for serd_node_new_uri() to resolve a URI string.

SerdNode serd_node_new_file_uri(const uint8_t *path, const uint8_t *hostname, SerdURI *out, bool escape)

Create a new file URI node from a file system path and optional hostname.

Backslashes in Windows paths will be converted and ‘’ will always be percent encoded. If escape is true, all other invalid characters will be percent encoded as well.

If path is relative, hostname is ignored. If out is not NULL, it will be set to the parsed URI.

SerdNode serd_node_new_uri(const SerdURI *uri, const SerdURI *base, SerdURI *out)

Create a new node by serialising uri into a new string.

Parameters
  • uri – The URI to serialise.

  • base – Base URI to resolve uri against (or NULL for no resolution).

  • out – Set to the parsing of the new URI (i.e. points only to memory owned by the new returned node).

SerdNode serd_node_new_relative_uri(const SerdURI *uri, const SerdURI *base, const SerdURI *root, SerdURI *out)

Create a new node by serialising uri into a new relative URI.

Parameters
  • uri – The URI to serialise.

  • base – Base URI to make uri relative to, if possible.

  • root – Root URI for resolution (see serd_uri_serialise_relative()).

  • out – Set to the parsing of the new URI (i.e. points only to memory owned by the new returned node).

SerdNode serd_node_new_decimal(double d, unsigned frac_digits)

Create a new node by serialising d into an xsd:decimal string.

The resulting node will always contain a ‘.’, start with a digit, and end with a digit (i.e. will have a leading and/or trailing ‘0’ if necessary). It will never be in scientific notation. A maximum of frac_digits digits will be written after the decimal point, but trailing zeros will automatically be omitted (except one if d is a round integer).

Note that about 16 and 8 fractional digits are required to precisely represent a double and float, respectively.

Parameters
  • d – The value for the new node.

  • frac_digits – The maximum number of digits after the decimal place.

SerdNode serd_node_new_integer(int64_t i)

Create a new node by serialising i into an xsd:integer string.

SerdNode serd_node_new_blob(const void *buf, size_t size, bool wrap_lines)

Create a node by serialising buf into an xsd:base64Binary string.

This function can be used to make a serialisable node out of arbitrary binary data, which can be decoded using serd_base64_decode().

Parameters
  • buf – Raw binary input data.

  • size – Size of buf.

  • wrap_lines – Wrap lines at 76 characters to conform to RFC 2045.

SerdNode serd_node_copy(const SerdNode *node)

Make a deep copy of node.

Returns

a node that the caller must free with serd_node_free().

bool serd_node_equals(const SerdNode *a, const SerdNode *b)

Return true iff a is equal to b

void serd_node_free(SerdNode *node)

Free any data owned by node.

Note that if node is itself dynamically allocated (which is not the case for nodes created internally by serd), it will not be freed.

Event Handlers

struct SerdError

An error description.

SerdStatus status

Error code.

const uint8_t *filename

File with error.

unsigned line

Line in file with error or 0.

unsigned col

Column in file with error.

const char *fmt

Printf-style format string.

va_list *args

Arguments for fmt.

enum SerdStatementFlag

Flags indicating inline abbreviation information for a statement.

enumerator SERD_EMPTY_S

Empty blank node subject.

enumerator SERD_EMPTY_O

Empty blank node object.

enumerator SERD_ANON_S_BEGIN

Start of anonymous subject.

enumerator SERD_ANON_O_BEGIN

Start of anonymous object.

enumerator SERD_ANON_CONT

Continuation of anonymous node.

enumerator SERD_LIST_S_BEGIN

Start of list subject.

enumerator SERD_LIST_O_BEGIN

Start of list object.

enumerator SERD_LIST_CONT

Continuation of list.

typedef uint32_t SerdStatementFlags

Bitwise OR of SerdStatementFlag values.

typedef SerdStatus (*SerdErrorSink)(void *handle, const SerdError *error)

Sink (callback) for errors.

Param handle

Handle for user data.

Param error

Error description.

typedef SerdStatus (*SerdBaseSink)(void *handle, const SerdNode *uri)

Sink (callback) for base URI changes.

Called whenever the base URI of the serialisation changes.

typedef SerdStatus (*SerdPrefixSink)(void *handle, const SerdNode *name, const SerdNode *uri)

Sink (callback) for namespace definitions.

Called whenever a prefix is defined in the serialisation.

typedef SerdStatus (*SerdStatementSink)(void *handle, SerdStatementFlags flags, const SerdNode *graph, const SerdNode *subject, const SerdNode *predicate, const SerdNode *object, const SerdNode *object_datatype, const SerdNode *object_lang)

Sink (callback) for statements.

Called for every RDF statement in the serialisation.

typedef SerdStatus (*SerdEndSink)(void *handle, const SerdNode *node)

Sink (callback) for anonymous node end markers.

This is called to indicate that the anonymous node with the given value will no longer be referred to by any future statements (i.e. the anonymous serialisation of the node is finished).

Environment

typedef struct SerdEnvImpl SerdEnv

Lexical environment for relative URIs or CURIEs (base URI and namespaces)

SerdEnv *serd_env_new(const SerdNode *base_uri)

Create a new environment.

void serd_env_free(SerdEnv *env)

Free env

const SerdNode *serd_env_get_base_uri(const SerdEnv *env, SerdURI *out)

Get the current base URI.

SerdStatus serd_env_set_base_uri(SerdEnv *env, const SerdNode *uri)

Set the current base URI.

SerdStatus serd_env_set_prefix(SerdEnv *env, const SerdNode *name, const SerdNode *uri)

Set a namespace prefix.

A namespace prefix is used to expand CURIE nodes, for example, with the prefix “xsd” set to “http://www.w3.org/2001/XMLSchema#”, “xsd:decimal” will expand to “http://www.w3.org/2001/XMLSchema#decimal”.

SerdStatus serd_env_set_prefix_from_strings(SerdEnv *env, const uint8_t *name, const uint8_t *uri)

Set a namespace prefix.

bool serd_env_qualify(const SerdEnv *env, const SerdNode *uri, SerdNode *prefix, SerdChunk *suffix)

Qualify uri into a CURIE if possible.

SerdStatus serd_env_expand(const SerdEnv *env, const SerdNode *curie, SerdChunk *uri_prefix, SerdChunk *uri_suffix)

Expand curie.

Errors: SERD_ERR_BAD_ARG if curie is not valid, or SERD_ERR_BAD_CURIE if prefix is not defined in env.

SerdNode serd_env_expand_node(const SerdEnv *env, const SerdNode *node)

Expand node, which must be a CURIE or URI, to a full URI.

Returns null if node can not be expanded.

void serd_env_foreach(const SerdEnv *env, SerdPrefixSink func, void *handle)

Call func for each prefix defined in env

Reader

typedef struct SerdReaderImpl SerdReader

Streaming parser that reads a text stream and writes to a statement sink.

SerdReader *serd_reader_new(SerdSyntax syntax, void *handle, void (*free_handle)(void*), SerdBaseSink base_sink, SerdPrefixSink prefix_sink, SerdStatementSink statement_sink, SerdEndSink end_sink)

Create a new RDF reader.

void serd_reader_set_strict(SerdReader *reader, bool strict)

Enable or disable strict parsing.

The reader is non-strict (lax) by default, which will tolerate URIs with invalid characters. Setting strict will fail when parsing such files. An error is printed for invalid input in either case.

void serd_reader_set_error_sink(SerdReader *reader, SerdErrorSink error_sink, void *error_handle)

Set a function to be called when errors occur during reading.

The error_sink will be called with handle as its first argument. If no error function is set, errors are printed to stderr in GCC style.

void *serd_reader_get_handle(const SerdReader *reader)

Return the handle passed to serd_reader_new()

void serd_reader_add_blank_prefix(SerdReader *reader, const uint8_t *prefix)

Set a prefix to be added to all blank node identifiers.

This is useful when multiple files are to be parsed into the same output (a model or a file). Since Serd preserves blank node IDs, this could cause conflicts where two non-equivalent blank nodes are merged, resulting in corrupt data. By setting a unique blank node prefix for each parsed file, this can be avoided, while preserving blank node names.

void serd_reader_set_default_graph(SerdReader *reader, const SerdNode *graph)

Set the URI of the default graph.

If this is set, the reader will emit quads with the graph set to the given node for any statements that are not in a named graph (which is currently all of them since Serd currently does not support any graph syntaxes).

SerdStatus serd_reader_read_file(SerdReader *reader, const uint8_t *uri)

Read a file at a given uri

SerdStatus serd_reader_start_stream(SerdReader *reader, FILE *file, const uint8_t *name, bool bulk)

Start an incremental read from a file handle.

Iff bulk is true, file will be read a page at a time. This is more efficient, but uses a page of memory and means that an entire page of input must be ready before any callbacks will fire. To react as soon as input arrives, set bulk to false.

SerdStatus serd_reader_start_source_stream(SerdReader *reader, SerdSource read_func, SerdStreamErrorFunc error_func, void *stream, const uint8_t *name, size_t page_size)

Start an incremental read from a user-specified source.

The read_func is guaranteed to only be called for page_size elements with size 1 (i.e. page_size bytes).

SerdStatus serd_reader_read_chunk(SerdReader *reader)

Read a single “chunk” of data during an incremental read.

This function will read a single top level description, and return. This may be a directive, statement, or several statements; essentially it reads until a ‘.’ is encountered. This is particularly useful for reading directly from a pipe or socket.

SerdStatus serd_reader_end_stream(SerdReader *reader)

Finish an incremental read from a file handle.

SerdStatus serd_reader_read_file_handle(SerdReader *reader, FILE *file, const uint8_t *name)

Read file

SerdStatus serd_reader_read_source(SerdReader *reader, SerdSource source, SerdStreamErrorFunc error, void *stream, const uint8_t *name, size_t page_size)

Read a user-specified byte source.

SerdStatus serd_reader_read_string(SerdReader *reader, const uint8_t *utf8)

Read utf8

void serd_reader_free(SerdReader *reader)

Free reader

Writer

enum SerdStyle

Syntax style options.

These flags allow more precise control of writer output style. Note that some options are only supported for some syntaxes, for example, NTriples does not support abbreviation and is always ASCII.

enumerator SERD_STYLE_ABBREVIATED

Abbreviate triples when possible.

enumerator SERD_STYLE_ASCII

Escape all non-ASCII characters.

enumerator SERD_STYLE_RESOLVED

Resolve URIs against base URI.

enumerator SERD_STYLE_CURIED

Shorten URIs into CURIEs.

enumerator SERD_STYLE_BULK

Write output in pages.

typedef struct SerdWriterImpl SerdWriter

Streaming serialiser that writes a text stream as statements are pushed.

SerdWriter *serd_writer_new(SerdSyntax syntax, SerdStyle style, SerdEnv *env, const SerdURI *base_uri, SerdSink ssink, void *stream)

Create a new RDF writer.

void serd_writer_free(SerdWriter *writer)

Free writer

SerdEnv *serd_writer_get_env(SerdWriter *writer)

Return the env used by writer

size_t serd_file_sink(const void *buf, size_t len, void *stream)

A convenience sink function for writing to a FILE*.

This function can be used as a SerdSink when writing to a FILE*. The stream parameter must be a FILE* opened for writing.

size_t serd_chunk_sink(const void *buf, size_t len, void *stream)

A convenience sink function for writing to a string.

This function can be used as a SerdSink to write to a SerdChunk which is resized as necessary with realloc(). The stream parameter must point to an initialized SerdChunk. When the write is finished, the string should be retrieved with serd_chunk_sink_finish().

uint8_t *serd_chunk_sink_finish(SerdChunk *stream)

Finish a serialisation to a chunk with serd_chunk_sink().

The returned string is the result of the serialisation, which is null terminated (by this function) and owned by the caller.

void serd_writer_set_error_sink(SerdWriter *writer, SerdErrorSink error_sink, void *error_handle)

Set a function to be called when errors occur during writing.

The error_sink will be called with handle as its first argument. If no error function is set, errors are printed to stderr.

void serd_writer_chop_blank_prefix(SerdWriter *writer, const uint8_t *prefix)

Set a prefix to be removed from matching blank node identifiers.

This is the counterpart to serd_reader_add_blank_prefix() which can be used to “undo” added prefixes.

SerdStatus serd_writer_set_base_uri(SerdWriter *writer, const SerdNode *uri)

Set the current output base URI, and emit a directive if applicable.

Note this function can be safely casted to SerdBaseSink.

SerdStatus serd_writer_set_root_uri(SerdWriter *writer, const SerdNode *uri)

Set the current root URI.

The root URI should be a prefix of the base URI. The path of the root URI is the highest path any relative up-reference can refer to. For example, with root file:///foo/root and base file:///foo/root/base, file:///foo/root will be written as <../>, but file:///foo will be written non-relatively as file:///foo. If the root is not explicitly set, it defaults to the base URI, so no up-references will be created at all.

SerdStatus serd_writer_set_prefix(SerdWriter *writer, const SerdNode *name, const SerdNode *uri)

Set a namespace prefix (and emit directive if applicable).

Note this function can be safely casted to SerdPrefixSink.

SerdStatus serd_writer_write_statement(SerdWriter *writer, SerdStatementFlags flags, const SerdNode *graph, const SerdNode *subject, const SerdNode *predicate, const SerdNode *object, const SerdNode *datatype, const SerdNode *lang)

Write a statement.

Note this function can be safely casted to SerdStatementSink.

SerdStatus serd_writer_end_anon(SerdWriter *writer, const SerdNode *node)

Mark the end of an anonymous node’s description.

Note this function can be safely casted to SerdEndSink.

SerdStatus serd_writer_finish(SerdWriter *writer)

Finish a write.

This flushes any pending output, for example terminating punctuation, so that the output is a complete document.

struct SerdChunk

An unterminated string fragment.

const uint8_t *buf

Start of chunk.

size_t len

Length of chunk in bytes.

enum SerdSyntax

RDF syntax type.

enumerator SERD_TURTLE

Terse triples http://www.w3.org/TR/turtle.

enumerator SERD_NTRIPLES

Line-based triples http://www.w3.org/TR/n-triples/.

enumerator SERD_NQUADS

Line-based quads http://www.w3.org/TR/n-quads/.

enumerator SERD_TRIG

Terse quads http://www.w3.org/TR/trig/.

enum SerdNodeFlag

Flags indicating certain string properties relevant to serialisation.

enumerator SERD_HAS_NEWLINE

Contains line breaks (‘n’ or ‘r’)

enumerator SERD_HAS_QUOTE

Contains quotes (‘”’)

typedef uint32_t SerdNodeFlags

Bitwise OR of SerdNodeFlag values.

void serd_free(void *ptr)

Free memory allocated by Serd.

This function exists because some systems require memory allocated by a library to be freed by code in the same library. It is otherwise equivalent to the standard C free() function.