I had to do some digging around to figure out how to define a new Datatype with restrictions in RDF, so I thought it might make a useful post to save someone else the trouble in the future.
RDF datatypes are based on XSD datatypes, which are often used directly. Unfortunately, most implementations simply have the XSD types baked in and do not support or validate new datatype descriptions (though at least
sord_validate can). Regardless, it is sometimes necessary to define a datatype with a specific restriction so it can be machine validated. It’s a bit tricky to figure out how to do this, since everything is buried in specifications that aren’t as triple oriented as they should be. So, here is an example of defining a datatype restricted by regular expression in Turtle, derived from the OWL documentation:
<http://example.org/CSymbol> a rdfs:Datatype ; rdfs:comment "A symbol in the C programming language" ; owl:onDatatype xsd:string ; owl:withRestrictions ( [ xsd:pattern "[_a-zA-Z][_a-zA-Z0-9]*" ] ) .
The XSD specification defines several
constraining facets you can use in this way. See the XSD specification for details, but the most obvious and useful for RDF are: xsd:length, xsd:minLength, xsd:maxLength, xsd:pattern, xsd:maxInclusive, xsd:maxExclusive, xsd:minInclusive, xsd:minExclusive. For example, you can define a numeric type with restricted range like so:
<http://example.org/AnswerishInteger> a rdfs:Datatype ; rdfs:comment "An integer between 24 and 42 inclusive" ; owl:onDatatype xsd:integer ; owl:withRestrictions ( [ xsd:minInclusive 24 ] [ xsd:maxInclusive 42 ] ) .
Defining datatypes in this way and using them as the rdfs:range for properties is a good idea because it describes which values are valid in a machine readable way. This makes it possible for simple generic tools to validate data, ensuring that all literals are valid values for the property they describe.