NAME

serd-pipe —

read and write RDF data

SYNOPSIS

serd-pipe [-ChV] [-B base] [-I syntax] [-O syntax] [-R root] [-b bytes] [-k bytes] [-o filename] [-s string] input ...

DESCRIPTION

serd-pipe is a fast command-line utility for streaming RDF data. It reads one or more files and writes the data again, possibly in a different form. By default, the input syntax is guessed from the file extension, and line-based output is written to standard output.

serd-pipe writes statements as they are read, in the same order. It uses very little memory and can process arbitrarily large files, either directly or as part of a pipeline. It is useful for things like checking syntax, converting to a different syntax, pretty-printing documents, merging files, expanding URIs, and so on.

The simplest way to use serd-pipe is by giving files for both input and output. This way, reasonable options are chosen by default based on the filename. For example, most common tasks can be accomplished with simple commands like

$ serd-pipe -o pretty.ttl
  input.nt

Standard input can be read by using - instead of a filename, and giving the input syntax explicitly:

$ cat file.ttl | serd-pipe -I turtle
  -

The options are as follows:

-B base

Base URI, path, or rebase to use the output path. This is used to resolve any relative URI references in the input.

If the input is a file, its URI is used as the base by default. This causes relative references to be written just as they are in the input. Note, however, that this may not be desired if the output is in a different directory. For example, <file.ttl> would not point to the same file from the new location.

The special rebase argument will instead use the output filename set by the -o option. This will write references relative to the output file, so that parsing it will produce the same absolute URIs as the original input. For example, the above may be written as <../file.ttl> if the output is written to some sibling directory.

Generally, the default is best when copying data along with other bundled files, while rebase is best for writing data in a new location which still refers to the original paths.

These options are intended to make the most common tasks as simple as possible. An arbitrary base URI can also be given explicitly.

-C

Convert literals to canonical form. Literals with supported XSD datatypes will be parsed and rewritten canonically. Invalid literals will cause an error. All numeric datatypes are supported, as well as boolean, duration, datetime, time, hexBinary, and base64Binary.

-I syntax

Set an input syntax or option. May be given multiple times. The case-insensitive syntax can be NQuads, NTriples, TriG, Turtle, or one of the following options:

lax: Tolerate invalid input where possible. Warnings will be printed for syntax errors, but parsing will attempt to continue. Note that data may be lost when using this option!
variables: Support parsing variable nodes. Variables can be written in SPARQL style, for example ?name or $name.
relative: Read relative URI references exactly without resolving them. Normally, all relative URIs are expanded against the base URI when reading. This flag disables that, so URI references will be passed through exactly as they are in the input.
global: Assume a clean global namespace for blank node labels, and do not automatically add prefixes. Normally, a prefix like f1 is added to blank node labels when reading multiple files, to prevent labels in different files from clashing. This option disables that, so blank node labels will be passed through without any added prefix. Note that this may corrupt the output by merging distinct blank nodes.
generated: Read seemingly generated blank node labels exactly without adjusting them. Normally, blank node labels like b123 are adapted to avoid potential clashes with generated ones. This flag disables that, so such labels will be passed through exactly as they are in the input. Note that this may corrupt the output by merging distinct blank nodes.

-O syntax

Set an output syntax or option. May be given multiple times. The case-insensitive syntax can be empty, NQuads, NTriples, TriG, Turtle, or one of the following options:

ascii: Escape all non-ASCII characters. Normally, text is written in UTF-8. This flag will escape non-ASCII characters in text as Unicode code points like \U00B7 or \U0001F600.
contextual: Suppress writing directives that describe the context. Normally when writing Turtle or Trig, a document will have a header that defines all the prefixes used in the input. This flag will disable writing those directives, so the output is document fragment with an implicit context. This can be useful for writing output intended for humans.
expanded: Write expanded URIs instead of prefixed names.
verbatim: Write URI references exactly as they are in the input. This avoids resolving URIs and making them relative to the output base URI.
terse: Write terser output without newlines. This can be useful for writing a line-based description of suitably structured data.
lax: Tolerate invalid UTF-8 by writing the replacement character when necessary. Note that data may be lost when using this option!

The empty syntax suppresses the output, so that only warnings and errors will be printed.

-R root

Keep relative URIs within a root URI. This will avoid creating any relative URI references with leading path segments like ../ that enter a parent of root.

For example, if /home/you/file.ttl is written to the file /home/me/output.ttl using -B rebase, then it will be written as <../you/file.ttl>. Setting -R /home/me/ would prevent references from “escaping” like this, so the above would instead be written as <file:///home/you/file.ttl>.

This is useful for making relocatable “bundles” of resources, since it can keep all relative references within the bundle, while still allowing up-references to be used.

-V

Display version information and exit.

-b bytes

I/O block size. This is the number of bytes in a file that will be read or written at once. The default is 4096, which should perform well in most cases. Note that this only applies to files, standard input and output are always processed one byte at a time.

-h

Print the command line options.

-k bytes

Parser stack size. For performance and security reasons, parsing is performed with a fixed-size stack. This option sets a hard limit on the total amount of space used for parsing. The default is 1 megabyte, which should be more than enough for most data. This option can be used to reduce memory consumption, or to enable parsing documents with extremely deep nesting or extremely large literal values.

-o filename

Write output to the given filename instead of stdout.

-s string

Parse string as input.

ENVIRONMENT

Error messages and warnings are printed in color by default if the output is a terminal. This can be controlled by common environment variables:

NO_COLOR: If present (regardless of value), color is disabled.
CLICOLOR: If set to 0, color is disabled.
CLICOLOR_FORCE: If set to anything other than 0, color is forced on.

See http://no-color.org/ and https://bixense.com/clicolors/ for details.

EXIT STATUS

serd-pipe exits with a status of 0, or non-zero if an error occured.

EXAMPLES

To print an NTriples file as Turtle:

$ serd-pipe -O turtle
  input.nt

To print only errors and discard the output:

$ serd-pipe -O empty
  input.ttl

To pretty-print a file:

$ serd-pipe -o pretty.ttl
  input.ttl

To expand all prefixed names into full URIs:

$ serd-pipe -O expanded -o
  expanded.ttl input.ttl

To merge two files:

$ serd-pipe -o merged.ttl header.ttl
  body.ttl

STANDARDS

W3C, RDF 1.1 NQuads, February 2014. https://www.w3.org/TR/n-quads/
W3C, RDF 1.1 NTriples, February 2014. https://www.w3.org/TR/n-triples/
W3C, RDF 1.1 TriG, February 2014. https://www.w3.org/TR/trig/
W3C, RDF 1.1 Turtle, February 2014. https://www.w3.org/TR/turtle/

AUTHORS

serd-pipe is a part of serd, by David Robillard d@drobilla.net.