4.8 Stream Processing

The above examples show how a document can be either written to a file or loaded into a model, simply by changing the sink that the data is written to. There are also sinks that filter or transform the data before passing it on to another sink, which can be used to build more advanced pipelines with several processing stages.

4.8.1 Canonical Literals

A canon is a stream processor that converts literals with supported XSD datatypes into canonical form. For example, this will rewrite an xsd:decimal literal like “.10” as “0.1”. A canon is created with serd_canon_new(), which needs to be passed the “target” sink that the transformed statements should be written to, for example:

SerdSink* canon = serd_canon_new(world, inserter, 0);

The last argument is a bitwise OR of SerdCanonFlag flags. For example, SERD_CANON_LAX will tolerate and pass through invalid literals, which can be useful for cleaning up questionabe data as much as possible without losing any information.

4.8.2 Filtering Statements

A filter is a stream processor that filters statements based on a pattern. It can be configured in either inclusive or exclusive mode, which passes through only statements that match or don’t match the pattern, respectively. A filter is created with serd_filter_new(), which takes a target, pattern, and inclusive flag. For example, all statements with predicate rdf:type could be filtered out when loading a model:

SerdSink* filter = serd_filter_new(world,    // World
                                   inserter, // Target
                                   NULL,     // Subject
                                   rdf_type, // Predicate
                                   NULL,     // Object
                                   NULL,     // Graph
                                   true);    // Inclusive

If false is passed for the last parameter instead, then the filter operates in exclusive mode and will instead insert only statements with predicate rdf:type.