serdi
—
read, filter, transform, and write RDF data
serdi |
[-Cabefhlmqtvx ] [-F
pattern | -G
pattern] [-I
base] [-V
checks] [-X
checks] [-c
prefix] [-i
syntax] [-k
bytes] [-o
syntax] [-p
prefix] [-r
root] [-s
string] [-w
filename] input ... |
serdi
is a fast command-line utility for processing RDF
data. It reads one or more documents and writes the data again, possibly
transformed and/or in a different syntax. By default, the input syntax is
guessed from the file extension, and output is written in NTriples or NQuads.
serdi
can be used to check for syntax
errors, convert from one syntax to another, pretty-print documents, or
transform URIs and blank node IDs.
The options are as follows:
-C
- Convert literals to canonical form. Literals with supported XSD datatypes
will be parsed and rewritten canonically. All numeric datatypes are
supported, as well as boolean,
duration, datetime,
time, hexBinary, and
base64Binary.
-F
pattern
- Filter out statements that match pattern. The
pattern must be a single statement written in NTriples or NQuads, with
variables like “?name” for wildcards. The names of variables
in the pattern are insignificant.
-G
pattern
- Only include statements that match pattern. This
option is like
-p
but inverted, so that only
matching statements are included, like grep.
-I
base
- Input base URI. Relative URI references in the input will be resolved
against this. When the input is a file, the URI of the file is
automatically used as the base URI. This option can be used to override
that, or to provide a base URI for input from stdin or a string.
-V
checks
- Validate data with the given checks, which is a
regular expression that matches a set of check names to enable, or the
special value “all” which enables all checks. See
VALIDATION below for a detailed list
of all checks. Validation requires a model, so this option implicitly
enables
-m
.
-X
checks
- Exclude checks from the set of checks enabled by a
previous
-V
option. This is typically after
-V
all to suppress a few
specific checks.
-a
- Write ASCII output. If this is enabled, all non-ASCII characters will be
escaped, even if the output syntax allows them to be written in UTF-8.
-b
- Bulk output writing. If this is enabled, output will be written a page at
a time, rather than a byte at a time.
-c
prefix
- Chop prefix from matching blank node IDs. This is
the inverse of
-p
.
-e
- Eat input one character at a time, rather than a page at a time which is
the default. This is useful when reading from a pipe since output will be
generated immediately as input arrives, rather than waiting until an
entire page of input has arrived. With this option serdi uses one page
less memory, but will likely be significantly slower.
-f
- Fast and loose mode. This disables shortening URIs into prefixed names or
relative URI references. If the model is enabled, then this writes the
model quickly in sorted order. Note that doing so with TriG or Turtle may
make the output ugly, since blank nodes will not be inlined.
-h
- Print the command line options.
-i
syntax
- Read input as syntax. Case is ignored, valid values
are: “NQuads”, “NTriples”,
“TriG”, and “Turtle”.
-k
bytes
- Parser stack size. For performance reasons, parsing is performed with a
fixed-size stack. By default, the stack is one page (4096 bytes), which
should be sufficient for parsing most documents. This option can be used
to increase or decrease the amount of memory available for parsing. It
must be large enough to fit one statement, with some extra space for
internal records. Something around twice the length of the longest
statement in text (as if it were written in NTriples or NQuads) is a
reasonable value.
-l
- Lax (non-strict) parsing. If this is enabled, recoverable syntax errors
will print a warning, but parsing will proceed starting at the next
statement if possible. Note that data may be lost when using this option.
-m
- Build a model in memory. This loads all of the input into memory before
writing the output. This will reorder statements and eliminate duplicates,
at the cost of performance and memory consumption. When writing TriG or
Turtle, this may enable better pretty-printing with more inline
descriptions.
-o
syntax
- Write output as syntax. Case is ignored, valid
values are: “empty”, “NQuads”,
“NTriples”, “TriG”, and
“Turtle”. When “empty” is given, output is
suppressed, so only errors will be printed.
-p
prefix
- Add prefix to blank node IDs. This can be used to
avoid clashes between blank node IDs in input documents.
-q
- Suppress all output except data.
-r
root
- Keep relative URIs within a root URI. This will
avoid creating any relative URI references with leading path segments like
“../” that enter a parent of root.
-s
string
- Parse string as input.
-t
- Write terser output without newlines.
-v
- Display version information and exit.
-w
filename
- Write output to the given filename instead of
stdout.
-x
- Support parsing variable nodes. Variables can be written in SPARQL style,
for example “?var” or “$var”.
- allValuesFrom
- Checks that all properties with owl:allValuesFrom restrictions have valid
value types.
- cardinalityEqual
- Checks that any instance of a class with a owl:cardinality property
restriction has exactly that many values of that property.
- cardinalityMax
- Checks that any instance of a class with a owl:maxCardinality property
restriction has no more than that many values of that property.
- cardinalityMin
- Checks that any instance of a class with a owl:minCardinality property
restriction has at least that many values of that property.
- classLabel
- Checks that every rdfs:Class has an rdfs:label.
- datatypeProperty
- Checks that datatype properties have literal (not instance) values.
- datatypeType
- Checks that every datatype is defined as a rdfs:Datatype.
- deprecatedClass
- Checks that there are no instances of deprecated classes.
- deprecatedProperty
- Checks that there are no uses of deprecated properties.
- functionalProperty
- Checks that no instance has several values of a functional property.
- instanceLiteral
- Checks that there are no instances where a literal is expected.
- instanceType
- Checks that every instance with an explicit type matches that type. This
is a broad check that triggers other type-related checks, but mainly it
will check that every instance of a class conforms to any restrictions on
that class.
- inverseFunctionalProperty
- Checks that at most one instance has a given value of an inverse
functional property.
- literalInstance
- Checks that there are no literals where an instance is expected.
- literalMaxExclusive
- Checks that literal values are not greater than or equal to any applicable
xsd:maxExclusive datatype restrictions.
- literalMaxInclusive
- Checks that literal values are not greater than any applicable
xsd:maxInclusive datatype restrictions.
- literalMinExclusive
- Checks that literal values are not less than or equal to any applicable
xsd:minExclusive datatype restrictions.
- literalMinInclusive
- Checks that literal values are not less than any applicable
xsd:minInclusive datatype restrictions.
- literalPattern
- Checks that literals with xsd:pattern restrictions match the regular
expression pattern for their datatype.
- literalRestriction
- Checks that literals with supported restrictions conform to those
restrictions. This is a high-level check that triggers the more specific
individual literal restriction checks.
- literalValue
- Checks that literals with supported XSD datatypes are valid. The set of
supported types is the same as when writing canonical forms.
- objectProperty
- Checks that object properties have instance (not literal) values.
- plainLiteralDatatype
- Checks that there are no typed literals where a plain literal is expected.
A plain literal may have an optional language tag, but not a
datatype.
- predicateType
- Checks that every predicate is defined as an rdf:Property.
- propertyDomain
- Checks that any instance with a property with an rdfs:domain is in that
domain.
- propertyLabel
- Checks that every rdf:Property has an rdfs:label.
- propertyRange
- Checks that the value for any property with an rdfs:range is in that
range.
- someValuesFrom
- Checks that instances of classes with owl:someValuesFrom property
restrictions have at least one matching property value.
- subclassCycle
- Checks that no class is a sub-class of itself, recursively. This ensures
that the graph is acyclic with respect to rdfs:subClassOf.
- subpropertyCycle
- Checks that no property is a sub-property of itself, recursively. This
ensures that the graph is acyclic with respect to rdfs:subPropertyOf.
serdi
exits with a status of 0, or non-zero if an error
occured.
To pretty-print a document:
$ serdi -o turtle file.ttl >
out.ttl
To print any errors:
$ serdi file.ttl >
/dev/null
To remove any rdf:type properties:
$ serdi -F "?s
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o ."
file.ttl
To include only rdf:type properties:
$ serdi -G "?s
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o ."
file.ttl