NAME

serdi —

read, filter, transform, and write RDF data

SYNOPSIS

serdi [-Cabefhlmqtvx] [-F pattern | -G pattern] [-I base] [-V checks] [-X checks] [-c prefix] [-i syntax] [-k bytes] [-o syntax] [-p prefix] [-r root] [-s string] [-w filename] input ...

DESCRIPTION

serdi is a fast command-line utility for processing RDF data. It reads one or more documents and writes the data again, possibly transformed and/or in a different syntax. By default, the input syntax is guessed from the file extension, and output is written in NTriples or NQuads.

serdi can be used to check for syntax errors, convert from one syntax to another, pretty-print documents, or transform URIs and blank node IDs.

The options are as follows:

-C: Convert literals to canonical form. Literals with supported XSD datatypes will be parsed and rewritten canonically. All numeric datatypes are supported, as well as boolean, duration, datetime, time, hexBinary, and base64Binary.
-F pattern: Filter out statements that match pattern. The pattern must be a single statement written in NTriples or NQuads, with variables like “?name” for wildcards. The names of variables in the pattern are insignificant.
-G pattern: Only include statements that match pattern. This option is like -p but inverted, so that only matching statements are included, like grep.
-I base: Input base URI. Relative URI references in the input will be resolved against this. When the input is a file, the URI of the file is automatically used as the base URI. This option can be used to override that, or to provide a base URI for input from stdin or a string.
-V checks: Validate data with the given checks, which is a regular expression that matches a set of check names to enable, or the special value “all” which enables all checks. See VALIDATION below for a detailed list of all checks. Validation requires a model, so this option implicitly enables -m.
-X checks: Exclude checks from the set of checks enabled by a previous -V option. This is typically after -V all to suppress a few specific checks.
-a: Write ASCII output. If this is enabled, all non-ASCII characters will be escaped, even if the output syntax allows them to be written in UTF-8.
-b: Bulk output writing. If this is enabled, output will be written a page at a time, rather than a byte at a time.
-c prefix: Chop prefix from matching blank node IDs. This is the inverse of -p.
-e: Eat input one character at a time, rather than a page at a time which is the default. This is useful when reading from a pipe since output will be generated immediately as input arrives, rather than waiting until an entire page of input has arrived. With this option serdi uses one page less memory, but will likely be significantly slower.
-f: Fast and loose mode. This disables shortening URIs into prefixed names or relative URI references. If the model is enabled, then this writes the model quickly in sorted order. Note that doing so with TriG or Turtle may make the output ugly, since blank nodes will not be inlined.
-h: Print the command line options.
-i syntax: Read input as syntax. Case is ignored, valid values are: “NQuads”, “NTriples”, “TriG”, and “Turtle”.
-k bytes: Parser stack size. For performance reasons, parsing is performed with a fixed-size stack. By default, the stack is one page (4096 bytes), which should be sufficient for parsing most documents. This option can be used to increase or decrease the amount of memory available for parsing. It must be large enough to fit one statement, with some extra space for internal records. Something around twice the length of the longest statement in text (as if it were written in NTriples or NQuads) is a reasonable value.
-l: Lax (non-strict) parsing. If this is enabled, recoverable syntax errors will print a warning, but parsing will proceed starting at the next statement if possible. Note that data may be lost when using this option.
-m: Build a model in memory. This loads all of the input into memory before writing the output. This will reorder statements and eliminate duplicates, at the cost of performance and memory consumption. When writing TriG or Turtle, this may enable better pretty-printing with more inline descriptions.
-o syntax: Write output as syntax. Case is ignored, valid values are: “empty”, “NQuads”, “NTriples”, “TriG”, and “Turtle”. When “empty” is given, output is suppressed, so only errors will be printed.
-p prefix: Add prefix to blank node IDs. This can be used to avoid clashes between blank node IDs in input documents.
-q: Suppress all output except data.
-r root: Keep relative URIs within a root URI. This will avoid creating any relative URI references with leading path segments like “../” that enter a parent of root.
-s string: Parse string as input.
-t: Write terser output without newlines.
-v: Display version information and exit.
-w filename: Write output to the given filename instead of stdout.
-x: Support parsing variable nodes. Variables can be written in SPARQL style, for example “?var” or “$var”.

VALIDATION

allValuesFrom: Checks that all properties with owl:allValuesFrom restrictions have valid value types.
cardinalityEqual: Checks that any instance of a class with a owl:cardinality property restriction has exactly that many values of that property.
cardinalityMax: Checks that any instance of a class with a owl:maxCardinality property restriction has no more than that many values of that property.
cardinalityMin: Checks that any instance of a class with a owl:minCardinality property restriction has at least that many values of that property.
classLabel: Checks that every rdfs:Class has an rdfs:label.
datatypeProperty: Checks that datatype properties have literal (not instance) values.
datatypeType: Checks that every datatype is defined as a rdfs:Datatype.
deprecatedClass: Checks that there are no instances of deprecated classes.
deprecatedProperty: Checks that there are no uses of deprecated properties.
functionalProperty: Checks that no instance has several values of a functional property.
instanceLiteral: Checks that there are no instances where a literal is expected.
instanceType: Checks that every instance with an explicit type matches that type. This is a broad check that triggers other type-related checks, but mainly it will check that every instance of a class conforms to any restrictions on that class.
inverseFunctionalProperty: Checks that at most one instance has a given value of an inverse functional property.
literalInstance: Checks that there are no literals where an instance is expected.
literalMaxExclusive: Checks that literal values are not greater than or equal to any applicable xsd:maxExclusive datatype restrictions.
literalMaxInclusive: Checks that literal values are not greater than any applicable xsd:maxInclusive datatype restrictions.
literalMinExclusive: Checks that literal values are not less than or equal to any applicable xsd:minExclusive datatype restrictions.
literalMinInclusive: Checks that literal values are not less than any applicable xsd:minInclusive datatype restrictions.
literalPattern: Checks that literals with xsd:pattern restrictions match the regular expression pattern for their datatype.
literalRestriction: Checks that literals with supported restrictions conform to those restrictions. This is a high-level check that triggers the more specific individual literal restriction checks.
literalValue: Checks that literals with supported XSD datatypes are valid. The set of supported types is the same as when writing canonical forms.
objectProperty: Checks that object properties have instance (not literal) values.
plainLiteralDatatype: Checks that there are no typed literals where a plain literal is expected. A plain literal may have an optional language tag, but not a datatype.
predicateType: Checks that every predicate is defined as an rdf:Property.
propertyDomain: Checks that any instance with a property with an rdfs:domain is in that domain.
propertyLabel: Checks that every rdf:Property has an rdfs:label.
propertyRange: Checks that the value for any property with an rdfs:range is in that range.
someValuesFrom: Checks that instances of classes with owl:someValuesFrom property restrictions have at least one matching property value.
subclassCycle: Checks that no class is a sub-class of itself, recursively. This ensures that the graph is acyclic with respect to rdfs:subClassOf.
subpropertyCycle: Checks that no property is a sub-property of itself, recursively. This ensures that the graph is acyclic with respect to rdfs:subPropertyOf.

EXIT STATUS

serdi exits with a status of 0, or non-zero if an error occured.

EXAMPLES

To pretty-print a document:

$ serdi -o turtle file.ttl >
  out.ttl

To print any errors:

$ serdi file.ttl >
  /dev/null

To remove any rdf:type properties:

$ serdi -F "?s
  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o ."
  file.ttl

To include only rdf:type properties:

$ serdi -G "?s
  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o ."
  file.ttl

STANDARDS

W3C, RDF 1.1 NQuads, February 2014. https://www.w3.org/TR/n-quads/
W3C, RDF 1.1 NTriples, February 2014. https://www.w3.org/TR/n-triples/
W3C, RDF 1.1 TriG, February 2014. https://www.w3.org/TR/trig/
W3C, RDF 1.1 Turtle, February 2014. https://www.w3.org/TR/turtle/

AUTHORS

serdi is a part of serd, by David Robillard d@drobilla.net.