Serd¶
Serd is a lightweight C library for reading and writing RDF in Turtle, NTriples, NQuads, and TriG.
Overview¶
The API revolves around two main types: the Reader, which reads text and fires callbacks, and the Writer, which writes text when driven by corresponding functions. Both work in a streaming fashion but still support pretty-printing, so the pair can be used to pretty-print, translate, or otherwise process arbitrarily large documents very quickly. The context of a stream is tracked by the Environment, which stores the current base URI and set of namespace prefixes.
The complete API is declared in serd.h
:
#include <serd/serd.h>
Serd C API¶
Status Codes¶
-
enum SerdStatus¶
Return status code.
-
enumerator SERD_SUCCESS¶
No error.
-
enumerator SERD_FAILURE¶
Non-fatal failure.
-
enumerator SERD_ERR_UNKNOWN¶
Unknown error.
-
enumerator SERD_ERR_BAD_SYNTAX¶
Invalid syntax.
-
enumerator SERD_ERR_BAD_ARG¶
Invalid argument.
-
enumerator SERD_ERR_NOT_FOUND¶
Not found.
-
enumerator SERD_ERR_ID_CLASH¶
Encountered clashing blank node IDs.
-
enumerator SERD_ERR_BAD_CURIE¶
Invalid CURIE (e.g. prefix does not exist)
-
enumerator SERD_ERR_INTERNAL¶
Unexpected internal error (should not happen)
-
enumerator SERD_SUCCESS¶
-
const uint8_t *serd_strerror(SerdStatus status)¶
Return a string describing a status code.
String Utilities¶
-
size_t serd_strlen(const uint8_t *str, size_t *n_bytes, SerdNodeFlags *flags)¶
Measure a UTF-8 string.
- Returns
Length of
str
in characters (except NULL).- Parameters
str – A null-terminated UTF-8 string.
n_bytes – (Output) Set to the size of
str
in bytes (except NULL).flags – (Output) Set to the applicable flags.
-
double serd_strtod(const char *str, char **endptr)¶
Parse a string to a double.
The API of this function is identical to the standard C strtod function, except this function is locale-independent and always matches the lexical format used in the Turtle grammar (the decimal point is always “.”).
-
void *serd_base64_decode(const uint8_t *str, size_t len, size_t *size)¶
Decode a base64 string.
This function can be used to deserialise a blob node created with
serd_node_new_blob()
.- Parameters
str – Base64 string to decode.
len – The length of
str
.size – Set to the size of the returned blob in bytes.
- Returns
A newly allocated blob which must be freed with
serd_free()
.
Byte Streams¶
-
typedef int (*SerdStreamErrorFunc)(void *stream)¶
Function to detect I/O stream errors.
Identical semantics to
ferror
.- Returns
Non-zero if
stream
has encountered an error.
-
typedef size_t (*SerdSource)(void *buf, size_t size, size_t nmemb, void *stream)¶
Source function for raw string input.
Identical semantics to
fread
, but may set errno for more informative error reporting than supported by SerdStreamErrorFunc.- Param buf
Output buffer.
- Param size
Size of a single element of data in bytes (always 1).
- Param nmemb
Number of elements to read.
- Param stream
Stream to read from (FILE* for fread).
- Returns
Number of elements (bytes) read.
-
typedef size_t (*SerdSink)(const void *buf, size_t len, void *stream)¶
Sink function for raw string output.
URI¶
-
struct SerdURI¶
A parsed URI.
This struct directly refers to chunks in other strings, it does not own any memory itself. Thus, URIs can be parsed and/or resolved against a base URI in-place without allocating memory.
-
uint8_t *serd_file_uri_parse(const uint8_t *uri, uint8_t **hostname)¶
Get the unescaped path and hostname from a file URI.
The returned path and
*hostname
must be freed withserd_free()
.- Parameters
uri – A file URI.
hostname – If non-NULL, set to the hostname, if present.
- Returns
The path component of the URI.
-
bool serd_uri_string_has_scheme(const uint8_t *utf8)¶
Return true iff
utf8
starts with a valid URI scheme.
-
SerdStatus serd_uri_parse(const uint8_t *utf8, SerdURI *out)¶
Parse
utf8
, writing result toout
-
void serd_uri_resolve(const SerdURI *r, const SerdURI *base, SerdURI *t)¶
Set target
t
to referencer
resolved againstbase
.
-
size_t serd_uri_serialise(const SerdURI *uri, SerdSink sink, void *stream)¶
Serialise
uri
with a series of calls tosink
-
size_t serd_uri_serialise_relative(const SerdURI *uri, const SerdURI *base, const SerdURI *root, SerdSink sink, void *stream)¶
Serialise
uri
relative tobase
with a series of calls tosink
The
uri
is written as a relative URI iff if it a child ofbase
androot
. The optionalroot
parameter must be a prefix ofbase
and can be used keep up-references (“../”) within a certain namespace.
Node¶
-
struct SerdNode¶
A syntactic RDF node.
-
const uint8_t *buf¶
Value string.
-
size_t n_bytes¶
Size in bytes (excluding null)
-
size_t n_chars¶
String length (excluding null)
-
SerdNodeFlags flags¶
Node flags (string properties)
-
const uint8_t *buf¶
-
enum SerdType¶
Type of a node.
An RDF node, in the abstract sense, can be either a resource, literal, or a blank. This type is more precise, because syntactically there are two ways to refer to a resource (by URI or CURIE).
There are also two ways to refer to a blank node in syntax (by ID or anonymously), but this is handled by statement flags rather than distinct node types.
-
enumerator SERD_NOTHING¶
The type of a nonexistent node. This type is useful as a sentinel, but is never emitted by the reader.
-
enumerator SERD_LITERAL¶
Literal value. A literal optionally has either a language, or a datatype (not both).
-
enumerator SERD_URI¶
URI (absolute or relative). Value is an unquoted URI string, which is either a relative reference with respect to the current base URI (e.g. “foo/bar”), or an absolute URI (e.g. “http://example.org/foo”). RFC3986
-
enumerator SERD_CURIE¶
CURIE, a shortened URI. Value is an unquoted CURIE string relative to the current environment, e.g. “rdf:type”. CURIE Syntax 1.0
-
enumerator SERD_BLANK¶
A blank node. Value is a blank node ID without any syntactic prefix, like “id3”, which is meaningful only within this serialisation. RDF 1.1 Turtle
-
enumerator SERD_NOTHING¶
-
SerdNode serd_node_from_string(SerdType type, const uint8_t *str)¶
Make a (shallow) node from
str
.This measures, but does not copy,
str
. No memory is allocated.
-
SerdNode serd_node_from_substring(SerdType type, const uint8_t *str, size_t len)¶
Make a (shallow) node from a prefix of
str
.This measures, but does not copy,
str
. No memory is allocated. Note that the returned node may not be null terminated.
-
SerdNode serd_node_new_uri_from_node(const SerdNode *uri_node, const SerdURI *base, SerdURI *out)¶
Simple wrapper for
serd_node_new_uri()
to resolve a URI node.
-
SerdNode serd_node_new_uri_from_string(const uint8_t *str, const SerdURI *base, SerdURI *out)¶
Simple wrapper for
serd_node_new_uri()
to resolve a URI string.
-
SerdNode serd_node_new_file_uri(const uint8_t *path, const uint8_t *hostname, SerdURI *out, bool escape)¶
Create a new file URI node from a file system path and optional hostname.
Backslashes in Windows paths will be converted and ‘’ will always be percent encoded. If
escape
is true, all other invalid characters will be percent encoded as well.If
path
is relative,hostname
is ignored. Ifout
is not NULL, it will be set to the parsed URI.
-
SerdNode serd_node_new_uri(const SerdURI *uri, const SerdURI *base, SerdURI *out)¶
Create a new node by serialising
uri
into a new string.- Parameters
uri – The URI to serialise.
base – Base URI to resolve
uri
against (or NULL for no resolution).out – Set to the parsing of the new URI (i.e. points only to memory owned by the new returned node).
-
SerdNode serd_node_new_relative_uri(const SerdURI *uri, const SerdURI *base, const SerdURI *root, SerdURI *out)¶
Create a new node by serialising
uri
into a new relative URI.- Parameters
uri – The URI to serialise.
base – Base URI to make
uri
relative to, if possible.root – Root URI for resolution (see
serd_uri_serialise_relative()
).out – Set to the parsing of the new URI (i.e. points only to memory owned by the new returned node).
-
SerdNode serd_node_new_decimal(double d, unsigned frac_digits)¶
Create a new node by serialising
d
into an xsd:decimal string.The resulting node will always contain a ‘.’, start with a digit, and end with a digit (i.e. will have a leading and/or trailing ‘0’ if necessary). It will never be in scientific notation. A maximum of
frac_digits
digits will be written after the decimal point, but trailing zeros will automatically be omitted (except one ifd
is a round integer).Note that about 16 and 8 fractional digits are required to precisely represent a double and float, respectively.
- Parameters
d – The value for the new node.
frac_digits – The maximum number of digits after the decimal place.
-
SerdNode serd_node_new_integer(int64_t i)¶
Create a new node by serialising
i
into an xsd:integer string.
-
SerdNode serd_node_new_blob(const void *buf, size_t size, bool wrap_lines)¶
Create a node by serialising
buf
into an xsd:base64Binary string.This function can be used to make a serialisable node out of arbitrary binary data, which can be decoded using
serd_base64_decode()
.- Parameters
buf – Raw binary input data.
size – Size of
buf
.wrap_lines – Wrap lines at 76 characters to conform to RFC 2045.
-
SerdNode serd_node_copy(const SerdNode *node)¶
Make a deep copy of
node
.- Returns
a node that the caller must free with
serd_node_free()
.
Event Handlers¶
-
struct SerdError¶
An error description.
-
SerdStatus status¶
Error code.
-
const uint8_t *filename¶
File with error.
-
unsigned line¶
Line in file with error or 0.
-
unsigned col¶
Column in file with error.
-
const char *fmt¶
Printf-style format string.
-
va_list *args¶
Arguments for fmt.
-
SerdStatus status¶
-
enum SerdStatementFlag¶
Flags indicating inline abbreviation information for a statement.
-
enumerator SERD_EMPTY_S¶
Empty blank node subject.
-
enumerator SERD_EMPTY_O¶
Empty blank node object.
-
enumerator SERD_ANON_S_BEGIN¶
Start of anonymous subject.
-
enumerator SERD_ANON_O_BEGIN¶
Start of anonymous object.
-
enumerator SERD_ANON_CONT¶
Continuation of anonymous node.
-
enumerator SERD_LIST_S_BEGIN¶
Start of list subject.
-
enumerator SERD_LIST_O_BEGIN¶
Start of list object.
-
enumerator SERD_LIST_CONT¶
Continuation of list.
-
enumerator SERD_EMPTY_S¶
-
typedef uint32_t SerdStatementFlags¶
Bitwise OR of SerdStatementFlag values.
-
typedef SerdStatus (*SerdErrorSink)(void *handle, const SerdError *error)¶
Sink (callback) for errors.
- Param handle
Handle for user data.
- Param error
Error description.
-
typedef SerdStatus (*SerdBaseSink)(void *handle, const SerdNode *uri)¶
Sink (callback) for base URI changes.
Called whenever the base URI of the serialisation changes.
-
typedef SerdStatus (*SerdPrefixSink)(void *handle, const SerdNode *name, const SerdNode *uri)¶
Sink (callback) for namespace definitions.
Called whenever a prefix is defined in the serialisation.
-
typedef SerdStatus (*SerdStatementSink)(void *handle, SerdStatementFlags flags, const SerdNode *graph, const SerdNode *subject, const SerdNode *predicate, const SerdNode *object, const SerdNode *object_datatype, const SerdNode *object_lang)¶
Sink (callback) for statements.
Called for every RDF statement in the serialisation.
-
typedef SerdStatus (*SerdEndSink)(void *handle, const SerdNode *node)¶
Sink (callback) for anonymous node end markers.
This is called to indicate that the anonymous node with the given
value
will no longer be referred to by any future statements (i.e. the anonymous serialisation of the node is finished).
Environment¶
-
typedef struct SerdEnvImpl SerdEnv¶
Lexical environment for relative URIs or CURIEs (base URI and namespaces)
-
SerdStatus serd_env_set_base_uri(SerdEnv *env, const SerdNode *uri)¶
Set the current base URI.
-
SerdStatus serd_env_set_prefix(SerdEnv *env, const SerdNode *name, const SerdNode *uri)¶
Set a namespace prefix.
A namespace prefix is used to expand CURIE nodes, for example, with the prefix “xsd” set to “http://www.w3.org/2001/XMLSchema#”, “xsd:decimal” will expand to “http://www.w3.org/2001/XMLSchema#decimal”.
-
SerdStatus serd_env_set_prefix_from_strings(SerdEnv *env, const uint8_t *name, const uint8_t *uri)¶
Set a namespace prefix.
-
bool serd_env_qualify(const SerdEnv *env, const SerdNode *uri, SerdNode *prefix, SerdChunk *suffix)¶
Qualify
uri
into a CURIE if possible.
-
SerdStatus serd_env_expand(const SerdEnv *env, const SerdNode *curie, SerdChunk *uri_prefix, SerdChunk *uri_suffix)¶
Expand
curie
.Errors: SERD_ERR_BAD_ARG if
curie
is not valid, or SERD_ERR_BAD_CURIE if prefix is not defined inenv
.
-
SerdNode serd_env_expand_node(const SerdEnv *env, const SerdNode *node)¶
Expand
node
, which must be a CURIE or URI, to a full URI.Returns null if
node
can not be expanded.
-
void serd_env_foreach(const SerdEnv *env, SerdPrefixSink func, void *handle)¶
Call
func
for each prefix defined inenv
Reader¶
-
typedef struct SerdReaderImpl SerdReader¶
Streaming parser that reads a text stream and writes to a statement sink.
-
SerdReader *serd_reader_new(SerdSyntax syntax, void *handle, void (*free_handle)(void*), SerdBaseSink base_sink, SerdPrefixSink prefix_sink, SerdStatementSink statement_sink, SerdEndSink end_sink)¶
Create a new RDF reader.
-
void serd_reader_set_strict(SerdReader *reader, bool strict)¶
Enable or disable strict parsing.
The reader is non-strict (lax) by default, which will tolerate URIs with invalid characters. Setting strict will fail when parsing such files. An error is printed for invalid input in either case.
-
void serd_reader_set_error_sink(SerdReader *reader, SerdErrorSink error_sink, void *error_handle)¶
Set a function to be called when errors occur during reading.
The
error_sink
will be called withhandle
as its first argument. If no error function is set, errors are printed to stderr in GCC style.
-
void *serd_reader_get_handle(const SerdReader *reader)¶
Return the
handle
passed toserd_reader_new()
-
void serd_reader_add_blank_prefix(SerdReader *reader, const uint8_t *prefix)¶
Set a prefix to be added to all blank node identifiers.
This is useful when multiple files are to be parsed into the same output (a model or a file). Since Serd preserves blank node IDs, this could cause conflicts where two non-equivalent blank nodes are merged, resulting in corrupt data. By setting a unique blank node prefix for each parsed file, this can be avoided, while preserving blank node names.
-
void serd_reader_set_default_graph(SerdReader *reader, const SerdNode *graph)¶
Set the URI of the default graph.
If this is set, the reader will emit quads with the graph set to the given node for any statements that are not in a named graph (which is currently all of them since Serd currently does not support any graph syntaxes).
-
SerdStatus serd_reader_read_file(SerdReader *reader, const uint8_t *uri)¶
Read a file at a given
uri
-
SerdStatus serd_reader_start_stream(SerdReader *reader, FILE *file, const uint8_t *name, bool bulk)¶
Start an incremental read from a file handle.
Iff
bulk
is true,file
will be read a page at a time. This is more efficient, but uses a page of memory and means that an entire page of input must be ready before any callbacks will fire. To react as soon as input arrives, setbulk
to false.
-
SerdStatus serd_reader_start_source_stream(SerdReader *reader, SerdSource read_func, SerdStreamErrorFunc error_func, void *stream, const uint8_t *name, size_t page_size)¶
Start an incremental read from a user-specified source.
The
read_func
is guaranteed to only be called forpage_size
elements with size 1 (i.e.page_size
bytes).
-
SerdStatus serd_reader_read_chunk(SerdReader *reader)¶
Read a single “chunk” of data during an incremental read.
This function will read a single top level description, and return. This may be a directive, statement, or several statements; essentially it reads until a ‘.’ is encountered. This is particularly useful for reading directly from a pipe or socket.
-
SerdStatus serd_reader_end_stream(SerdReader *reader)¶
Finish an incremental read from a file handle.
-
SerdStatus serd_reader_read_file_handle(SerdReader *reader, FILE *file, const uint8_t *name)¶
Read
file
-
SerdStatus serd_reader_read_source(SerdReader *reader, SerdSource source, SerdStreamErrorFunc error, void *stream, const uint8_t *name, size_t page_size)¶
Read a user-specified byte source.
-
SerdStatus serd_reader_read_string(SerdReader *reader, const uint8_t *utf8)¶
Read
utf8
-
void serd_reader_free(SerdReader *reader)¶
Free
reader
Writer¶
-
enum SerdStyle¶
Syntax style options.
These flags allow more precise control of writer output style. Note that some options are only supported for some syntaxes, for example, NTriples does not support abbreviation and is always ASCII.
-
enumerator SERD_STYLE_ABBREVIATED¶
Abbreviate triples when possible.
-
enumerator SERD_STYLE_ASCII¶
Escape all non-ASCII characters.
-
enumerator SERD_STYLE_RESOLVED¶
Resolve URIs against base URI.
-
enumerator SERD_STYLE_CURIED¶
Shorten URIs into CURIEs.
-
enumerator SERD_STYLE_BULK¶
Write output in pages.
-
enumerator SERD_STYLE_ABBREVIATED¶
-
typedef struct SerdWriterImpl SerdWriter¶
Streaming serialiser that writes a text stream as statements are pushed.
-
SerdWriter *serd_writer_new(SerdSyntax syntax, SerdStyle style, SerdEnv *env, const SerdURI *base_uri, SerdSink ssink, void *stream)¶
Create a new RDF writer.
-
void serd_writer_free(SerdWriter *writer)¶
Free
writer
-
SerdEnv *serd_writer_get_env(SerdWriter *writer)¶
Return the env used by
writer
-
size_t serd_file_sink(const void *buf, size_t len, void *stream)¶
A convenience sink function for writing to a FILE*.
This function can be used as a SerdSink when writing to a FILE*. The
stream
parameter must be a FILE* opened for writing.
-
size_t serd_chunk_sink(const void *buf, size_t len, void *stream)¶
A convenience sink function for writing to a string.
This function can be used as a SerdSink to write to a
SerdChunk
which is resized as necessary with realloc(). Thestream
parameter must point to an initializedSerdChunk
. When the write is finished, the string should be retrieved withserd_chunk_sink_finish()
.
-
uint8_t *serd_chunk_sink_finish(SerdChunk *stream)¶
Finish a serialisation to a chunk with
serd_chunk_sink()
.The returned string is the result of the serialisation, which is null terminated (by this function) and owned by the caller.
-
void serd_writer_set_error_sink(SerdWriter *writer, SerdErrorSink error_sink, void *error_handle)¶
Set a function to be called when errors occur during writing.
The
error_sink
will be called withhandle
as its first argument. If no error function is set, errors are printed to stderr.
-
void serd_writer_chop_blank_prefix(SerdWriter *writer, const uint8_t *prefix)¶
Set a prefix to be removed from matching blank node identifiers.
This is the counterpart to
serd_reader_add_blank_prefix()
which can be used to “undo” added prefixes.
-
SerdStatus serd_writer_set_base_uri(SerdWriter *writer, const SerdNode *uri)¶
Set the current output base URI, and emit a directive if applicable.
Note this function can be safely casted to SerdBaseSink.
-
SerdStatus serd_writer_set_root_uri(SerdWriter *writer, const SerdNode *uri)¶
Set the current root URI.
The root URI should be a prefix of the base URI. The path of the root URI is the highest path any relative up-reference can refer to. For example, with root file:///foo/root and base file:///foo/root/base, file:///foo/root will be written as <../>, but file:///foo will be written non-relatively as file:///foo. If the root is not explicitly set, it defaults to the base URI, so no up-references will be created at all.
-
SerdStatus serd_writer_set_prefix(SerdWriter *writer, const SerdNode *name, const SerdNode *uri)¶
Set a namespace prefix (and emit directive if applicable).
Note this function can be safely casted to SerdPrefixSink.
-
SerdStatus serd_writer_write_statement(SerdWriter *writer, SerdStatementFlags flags, const SerdNode *graph, const SerdNode *subject, const SerdNode *predicate, const SerdNode *object, const SerdNode *datatype, const SerdNode *lang)¶
Write a statement.
Note this function can be safely casted to SerdStatementSink.
-
SerdStatus serd_writer_end_anon(SerdWriter *writer, const SerdNode *node)¶
Mark the end of an anonymous node’s description.
Note this function can be safely casted to SerdEndSink.
-
SerdStatus serd_writer_finish(SerdWriter *writer)¶
Finish a write.
This flushes any pending output, for example terminating punctuation, so that the output is a complete document.
-
struct SerdChunk¶
An unterminated string fragment.
-
const uint8_t *buf¶
Start of chunk.
-
size_t len¶
Length of chunk in bytes.
-
const uint8_t *buf¶
-
enum SerdSyntax¶
RDF syntax type.
-
enumerator SERD_TURTLE¶
Terse triples http://www.w3.org/TR/turtle.
-
enumerator SERD_NTRIPLES¶
Line-based triples http://www.w3.org/TR/n-triples/.
-
enumerator SERD_NQUADS¶
Line-based quads http://www.w3.org/TR/n-quads/.
-
enumerator SERD_TRIG¶
Terse quads http://www.w3.org/TR/trig/.
-
enumerator SERD_TURTLE¶
-
enum SerdNodeFlag¶
Flags indicating certain string properties relevant to serialisation.
-
enumerator SERD_HAS_NEWLINE¶
Contains line breaks (‘n’ or ‘r’)
-
enumerator SERD_HAS_QUOTE¶
Contains quotes (‘”’)
-
enumerator SERD_HAS_NEWLINE¶
-
typedef uint32_t SerdNodeFlags¶
Bitwise OR of SerdNodeFlag values.
-
void serd_free(void *ptr)¶
Free memory allocated by Serd.
This function exists because some systems require memory allocated by a library to be freed by code in the same library. It is otherwise equivalent to the standard C free() function.