I had to do some digging around to figure out how to define a new
Datatype with restrictions in RDF,
so I thought it might make a useful post to save someone else the
trouble in the future.
RDF datatypes are based on XSD
datatypes, which are often used
directly. Unfortunately, most implementations simply have the XSD types
baked in and do not support or validate new datatype descriptions
(though at least sord_validate
can). Regardless, it is sometimes necessary to define a datatype with a
specific restriction so it can be machine validated. It's a bit tricky
to figure out how to do this, since everything is buried in
specifications that aren't as triple oriented as they should be. So,
here is an example of defining a datatype restricted by regular
expression in Turtle, derived from the OWL documentation:
<http://example.org/CSymbol>
a rdfs:Datatype ;
rdfs:comment "A symbol in the C programming language" ;
owl:onDatatype xsd:string ;
owl:withRestrictions (
[
xsd:pattern "[_a-zA-Z][_a-zA-Z0-9]*"
]
) .
The XSD specification defines several “constraining facets” you can use
in this way. See the XSD specification for details, but the most obvious
and useful for RDF are: xsd:length, xsd:minLength, xsd:maxLength,
xsd:pattern, xsd:maxInclusive, xsd:maxExclusive, xsd:minInclusive,
xsd:minExclusive. For example, you can define a numeric type with
restricted range like so:
<http://example.org/AnswerishInteger>
a rdfs:Datatype ;
rdfs:comment "An integer between 24 and 42 inclusive" ;
owl:onDatatype xsd:integer ;
owl:withRestrictions (
[
xsd:minInclusive 24
] [
xsd:maxInclusive 42
]
) .
Defining datatypes in this way and using them as the rdfs:range for
properties is a good idea because it describes which values are valid in
a machine readable way. This makes it possible for simple generic
tools to validate data, ensuring that all literals are valid values for
the property they describe.