Virtuoso's triple store supports optional full text indexing of RDF object values since version 5.0. It is possible to declare that objects of triples with a given predicate or graph get indexed. The graphs and triples may be enumerated or a wildcard may be used.
The triples for which a full text index entry exists can be found using the bif:contains or related filters and predicates.
For example, the query:
SQL>SELECT * FROM <people> WHERE { ?s foaf:Name ?name . ?name bif:contains "'rich*'". }
would match all subjects whose foaf:Name contained a word starting with Rich. This would match Richard, Richie etc.
Note that words and phrases should be enclosed in quotes if they contain spaces or other non-alphanumeric chars.
If the bif:contains or related predicate is applied to an object that is not a string or is not the object of an indexed triple, no match will be found.
The syntax for text patterns is identical to the syntax for the SQL contains predicate.
The SPARQL/SQL optimizer determines whether the text pattern will be used to drive the query or whether it will filter results after other conditions are applied first. In contrast to bif:contains, regexp matching never drives the query or makes use of an index, thus in practice regexps are checked after other conditions.
Whether the object of a given triple is indexed in the text index depends on indexing rules. If at least one indexing rule matches the triple, the object gets indexed if the object is a string. An indexing rule specifies a graph and a predicate. Either may be an IRI or NULL, in which case it matches all IRI's.
Rules also have a 'reason', which can be used to group rules into application-specific sets. A triple will stop being indexed only after all rules mandating its indexing are removed. When an application requires indexing a certain set of triples, rules are added for that purpose. These rules are tagged with the name of the application as their reason. When an application no longer requires indexing, the rules belonging to this application can be removed. This will not turn off indexing if another application still needs certain triples to stay indexed.
Indexing is enabled/disabled for specific graph/predicate combinations with:
create function DB.DBA.RDF_OBJ_FT_RULE_ADD (in rule_g varchar, in rule_p varchar, in reason varchar) returns integer
create function DB.DBA.RDF_OBJ_FT_RULE_DEL (in rule_g varchar, in rule_p varchar, in reason varchar) returns integer
The first function adds a rule. The first two arguments are the text representation of the IRI's for the graph and predicate. If NULL is given then all graph's or predicates match. Specifying both as NULL means that all string valued objects will be added to a text index.
Example:DB.DBA.RDF_OBJ_FT_RULE_ADD (null, null, 'All');
The second function reverses the effect of the first. Only a rule that has actually been added can be deleted. Thus one cannot say that all except a certain enumerated set should be indexed.
DB.DBA.RDF_OBJ_FT_RULE_DEL (null, null, 'All');
The reason argument is an arbitrary string identifying the application that needs this rule. Two applications can add the same rule. Removing one of them will still keep the rule in effect. If an object is indexed by more than one rule, the index data remain free from duplicates, neither index size nor speed is affected.
If DB.DBA.RDF_OBJ_FT_RULE_ADD detects that DB.DBA.RDF_QUAD contains quads whose graphs and/or predicates match to the new rule but which have not been indexed before then these quads are indexed automatically. However the function DB.DBA.RDF_OBJ_FT_RULE_DEL does not remove indexing data about related objects. Thus the presence of indexing data about an object does not imply that it is necessarily used in some quad that matches to some rule.
The above functions return one if the rule is added or deleted and zero if the call was redundant (the rule has been added before or there's no rule to delete).
-- We load Tim Berners-Lee's FOAF file into a graph called 'people'. SQL>DB.DBA.RDF_LOAD_RDFXML (http_get ('http://www.w3.org/People/Berners-Lee/card#i'), 'no', 'http://www.w3.org/people#'); Done. -- 172 msec. -- We check how many triples we got. SQL>SPARQL SELECT COUNT (*) FROM <http://www.w3.org/people#> WHERE {?s ?p ?o}; callret-0 INTEGER 266 No. of rows in result: 1 -- We check the GRAPH: <http://www.w3.org/people#> for objects like "Tim": SQL>SPARQL SELECT * FROM <http://www.w3.org/people#> WHERE { ?s ?p ?o . FILTER (?o LIKE '%Tim%') }; s p o VARCHAR VARCHAR VARCHAR _______________________________________________________________________________ http://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/name Timothy Berners-Lee http://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/nick TimBL http://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/2002/07/owl#sameAs http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee http://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/knows http://dbpedia.org/resource/Tim_Bray http://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/2000/01/rdf-schema#label Tim Berners-Lee http://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/givenname Timothy http://dbpedia.org/resource/Tim_Bray http://xmlns.com/foaf/0.1/name Tim Bray no http://purl.org/dc/elements/1.1/title Tim Berners-Lee's FOAF file 8 Rows. -- 230 msec. -- We specify that all string objects in the graph 'people' should be text indexed. SQL>DB.DBA.RDF_OBJ_FT_RULE_ADD('http://www.w3.org/people#', null, 'people'); Done. -- 130 msec. -- We update the text index. SQL>DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ (); Done. -- 140 msec. -- See impact of the index by querying the subjects and predicates -- of all triples in the GRAPH: <http://www.w3.org/people#>, -- where the object is a string which contains a word beginning with "TIM". SQL>SPARQL SELECT * FROM <http://www.w3.org/people#> WHERE { ?s ?p ?o . ?o bif:contains '"Timo*"'}; s p o VARCHAR VARCHAR VARCHAR _______________________________________________________________________________ http://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/name Timothy Berners-Lee http://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/givenname Timothy 2 Rows. -- 2 msec.
The query below is identical with that above but uses a different syntax. The filter syntax is more flexible in that it allows passing extra options to the contains predicate. These may be useful in the future.
SQL>SPARQL SELECT * FROM <people> WHERE { ?s ?p ?o . FILTER (bif:contains(?o, '"Timo*"')) };
It is advisable to upgrade to the latest version of Virtuoso before adding free-text rules for the first time. This is especially the case if large amounts of text are to be indexed. The reason is that the free-text index on RDF may be changed in future versions and automatic upgrading of an existing index data into the new format may take much more time than indexing from scratch.
The table DB.DBA.RDF_OBJ_FT_RULES stores list of free-text index configuration rules.
create table DB.DBA.RDF_OBJ_FT_RULES ( ROFR_G varchar not null, -- specific graph IRI or NULL for "all graphs" ROFR_P varchar not null, -- specific predicate IRI or NULL for "all predicates" ROFR_REASON varchar not null, -- identification string of a creator, preferably human-readable primary key (ROFR_G, ROFR_P, ROFR_REASON) );
Applications may read from this table but they should not write to it directly. Duplications in the rules do not affect the speed of free-text index operations because the content of the table is cached in memory in a special form. Unlike the use of configuration functions, directly writing to the table will not update the in-memory cache.
The table is convenient to search for rules added by a given application. If a unique identification string is used during installation of an application when rules are added then it's easy to remove those rules as part of any uninstall routine.
The triple store's text index is in manual batch mode by default. This means that changes in triples are periodically reflected in the text index but are not maintained in strict synchrony. This is much more efficient than keeping the indices in constant synchrony. This setting may be altered with the db.dba.vt_batch_update stored procedure.
To force synchronization of the RDF text index, use:
DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ ();
To set the text index to follow the triples in real time, use:
DB.DBA.VT_BATCH_UPDATE ('DB.DBA.RDF_OBJ', 'OFF', null);
To set the text index to be updated every 10 minutes, use:
DB.DBA.VT_BATCH_UPDATE ('DB.DBA.RDF_OBJ', 'ON', 10);
To make the update always manual, specify NULL as the last argument above.
One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some applications (e.g. those that import billions of triples) may set off triggers. This will make free-text index data incomplete. Calling procedure DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items by dropping and re-inserting every existing free-text index rule.
If an O field of a quad map pattern gets its value from a database column that has a free text index then this index can be used in SPARQL for efficient text searching. As a variation of this facility, the free-text index of another table may be used.
If a statement of a quad map pattern declaration starts with a declaration of table aliases, the table alias declaration may include the name of a table column that should have a text index. For example, consider the possibility of using a free-text index on the content of DAV resources stored in the DAV system tables of Virtuoso:
prefix mydav: <...> create quad storage mydav:metadata FROM WS.WS.SYS_DAV_RES as dav_resource text literal RES_CONTENT ... { ... mydav:resource-iri (dav_resource.RES_FULL_PATH) a mydav:resource ; mydav:resource-content dav_resource.RES_CONTENT ; mydav:resource-mime-type dav_resource.RESTYPE ; ... }
The clause text literal RES_CONTENT grants the SPARQL compiler permission to use a free-text index for objects that are literals composed from column dav_resource.RES_CONTENT. This clause also allows choosing between text literal (supports only the contains() predicate) and text xml literal (supports both contains() and xcontains()) text indexes. It is important to understand that the free-text index will produce results using raw relational data. If a literal class transformation changes the text stored in the column then these changes are ignored by free-text search. e.g. if a transformation concatenates a word to the value of the column, but the free-text search will not find this word.
The free-text index may be used in a more sophisticated way. Consider a built-in table DB.DBA.RDF_QUAD that does not have a free-text index. Moreover, the table does not contain the full values of all objects; the O column contains "short enough" values inlined, but long and special values are represented by links to the DB.DBA.RDF_OBJ table. The RDF_OBJ table, however, has free-text index that can be used. The full declaration of the built-in default mapping for default storage could be written this way:
-- Important! Do not try to execute on live system -- without first changing the quad storage and quad map pattern names! SPARQL create virtrdf:DefaultQuadMap as graph rdfdf:default-iid-nonblank (DB.DBA.RDF_QUAD.G) subject rdfdf:default-iid (DB.DBA.RDF_QUAD.S) predicate rdfdf:default-iid-nonblank (DB.DBA.RDF_QUAD.P) object rdfdf:default (DB.DBA.RDF_QUAD.O) create quad storage virtrdf:DefaultQuadStorage FROM DB.DBA.RDF_QUAD as physical_quad FROM DB.DBA.RDF_OBJ as physical_obj text xml literal RO_DIGEST of (physical_quad.O) WHERE (^{physical_quad.}^.O = ^{physical_obj.}^.RO_DIGEST) { create virtrdf:DefaultQuadMap as graph rdfdf:default-iid-nonblank (physical_quad.G) subject rdfdf:default-iid (physical_quad.S) predicate rdfdf:default-iid-nonblank (physical_quad.P) object rdfdf:default (physical_quad.O) . } ;
The reference to the free-text index is extended by clause of (physical_quad.O). This means that the free-text on DB.DBA.RDF_OBJ.RO_DIGEST will be used when the object value comes from physical_quad.O as if physical_quad.O were indexed itself. If a SPARQL query invokes virtrdf:DefaultQuadMap but contains no free-text criteria then only DB.DBA.RDF_QUAD appears in the final SQL statement and no join with DB.DBA.RDF_OBJ is made. Adding a free-text predicate will add DB.DBA.RDF_OBJ to the list of source tables and a join condition for DB.DBA.RDF_QUAD.O and DB.DBA.RDF_OBJ.RO_DIGEST; and it will add contains (RO_DIGEST, ...) predicate, rather than contains (O, ...). As a result, "you pay only for what you use": adding free-text index to the declaration does not add tables to the query unless the index is actually used.
Boolean functions bif:contains and bif:xcontains are used for objects that come from RDF Views as well as for regular "physical" triples. Every function takes two arguments and returns a boolean value. The first argument is an local variable. The argument variable should be used as an object field in the group pattern where the filter condition is placed. Moreover, the occurrence of the variable in an object field should be placed before the filter. If there are many occurrences of the variable in object fields then the free-text search is associated with the rightmost occurrence that is still to the left of the filter. The triple pattern that contains the rightmost occurrence is called the "intake" of the free-text search. When the SPARQL compiler chooses the appropriate quad map patterns that may generate data matching the intake triple pattern, it skips quad map patterns that have no declared free-text indexes, because nothing can be found by free-text search in data that have no free-text index. Every quad map pattern that has a free-text pattern will ultimately produce an invocation of the SQL contains or xcontains predicate, so the final result of a free-text search may be a union of free-text searches from different quad map patterns.
The described logic is important only in very complicated cases, whereas simple queries are self-evident:
SELECT * FROM <my-dav-graph> WHERE { ?resource a mydav:resource ; mydav:resource-content ?text . FILTER (bif:contains (?text, "hello and world")) }
or, more succinctly,
SELECT * FROM <my-dav-graph> WHERE { ?resource a mydav:resource ; mydav:resource-content ?text . ?text bif:contains "hello and world" . }
SQL> SPARQL SELECT * WHERE { ?s ?p ?o . ?o bif:contains 'NEW AND YORK' OPTION (score ?sc) . } ORDER BY DESC (?sc) LIMIT 10 s p o sc ANY ANY ANY ANY ______________________________________________________________________________________________________________________________________________________________________________ http://dbpedia.org/resource/New_York%2C_New_York_%28disambiguation%29 http://www.w3.org/2000/01/rdf-schema#comment New York, New York, New York kentini........ 88 http://dbpedia.org/resource/New_York%2C_New_York_%28disambiguation%29 http://dbpedia.org/property/abstract New York, New York, New York kentinin re.... 88 http://newyorkjobs.wordpress.com/2006/07/10/new-york-jobs-71006 http://purl.org/dc/elements/1.1/description York Marketing Jobs New York Retail Jobs.... 84 http://dbpedia.org/resource/Global_Communication http://dbpedia.org/property/contenu A - New York, New York (Headfuq Mix) B1 .... 84 http://dbpedia.org/resource/New_York_%28disambiguation%29 http://www.w3.org/2000/01/rdf-schema#comment New York a^?? New York amerikai vA~?ros .... 76 http://dbpedia.org/resource/New_York_%28disambiguation%29 http://dbpedia.org/property/abstract New York a^?? New York amerikai vA~?ros .... 76 http://dbpedia.org/resource/New_York_%28disambiguation%29 http://www.w3.org/2000/01/rdf-schema#comment New York ima lahko naslednje pomene: New ... 74 http://dbpedia.org/resource/New_York_%28disambiguation%29 http://dbpedia.org/property/abstract New York ima lahko naslednje pomene: New ... 74 http://dbpedia.org/resource/New_York_College http://www.w3.org/2000/01/rdf-schema#comment There are several colleges of New York t ... 72 http://dbpedia.org/resource/New_York_College http://dbpedia.org/property/abstract There are several colleges of New York t ... 72 No. of rows in result: 10
Starting with version 5.0, Virtuoso supports the SPARQL/Update extension to SPARQL. This is sufficient for most of routine data manipulation operations. If the SPARQL_UPDATE role is granted to user SPARQL user then data manipulation statements may be executed via the SPARQL web service endpoint as well as by data querying.
Two functions allow the user to alter RDF storage by inserting or deleting all triples listed in some vector. Both functions receive the IRI of the graph that should be altered and a vector of triples that should be added or removed. The graph IRI can be either an IRI ID or a string. The third optional argument controls the transactional behavior - the parameter value is passed to the log_enable function. The return values of these functions are not defined and should not be used by applications.
create function DB.DBA.RDF_INSERT_TRIPLES (in graph_iri any, in triples any, in log_mode integer := null) create function DB.DBA.RDF_DELETE_TRIPLES (in graph_iri any, in triples any, in log_mode integer := null)
Simple operations may be faster if written as low-level SQL code instead of using SPARUL. The use of SPARQL DELETE is unnecessary in cases where the better alternative is for the application to delete from RDF_QUAD using simple SQL filters like:
DELETE FROM DB.DBA.RDF_QUAD WHERE G = DB.DBA.RDF_MAKE_IID_OF_QNAME ( 'http://local.virt/DAV/sparql_demo/data/data-xml/source-simple2/source-data-01.rdf' );
On the other hand, simple filters does not work when the search criteria refer to triples that are affected by the modification. Consider a function that deletes all triples whose subjects are nodes of type 'http://xmlns.com/foaf/0.1/Person'. Type information is stored in triples that will be deleted, so the simplest function is something like this:
create procedure DELETE_PERSONAL_DATA (in foaf_graph varchar) { declare pdata_dict, pdata_array any; -- Step 1: select everything that should be deleted pdata_dict := (( sparql construct { ?s ?p ?o } WHERE { graph ?:foaf_graph { ?s ?p ?o . ?s rdf:type <http://xmlns.com/foaf/0.1/Person> } } )); -- Step 2: delete all found triples pdata_array := dict_list_keys (pdata_dict, 1); RDF_DELETE_TRIPLES (foaf_graph, pdata_array); }; DELETE_PERSONAL_DATA ( 'http://local.virt/DAV/sparql_demo/data/data-xml/source-simple2/source-data-01.rdf' );
From Virtuoso 5.0 onwards, applications can use SPARUL to do the same in a more convenient way:
create procedure DELETE_PERSONAL_DATA (in foaf_graph varchar) { sparql delete { ?s ?p ?o } WHERE { graph ?:foaf_graph { ?s ?p ?o . ?s rdf:type <http://xmlns.com/foaf/0.1/Person> } } };
The graph to be changed may be specified by an option preceding of query, instead of being specified in the 'insert into graph' clause.
SQL>SPARQL DEFINE input:default-graph-uri <http://mygraph.com> INSERT INTO <http://mygraph.com> { <http://myopenlink.net/dataspace/Kingsley#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#User> }; callret-0 VARCHAR _______________________________________________________________________________ Insert into <http://mygraph.com>, 1 triples -- done 1 Rows. -- 20 msec.
The following two statements are equivalent but the latter may work faster, especially if there are many RDF views in the system or if the graph in question contains triples from RDF views. Note that neither of these two statements affects data coming from RDF views.
SQL> SPARQL DELETE FROM GRAPH <http://mygraph.com> { ?s ?p ?o } FROM <http://mygraph> WHERE { ?s ?p ?o }; callret-0 VARCHAR _______________________________________________________________________________ Delete from <http://mygraph.com>, 1 triples -- done 1 Rows. -- 10 msec. SQL> SPARQL CLEAR GRAPH <http://mygraph.com>; callret-0 VARCHAR __________________________________________________________ Clear <http://mygraph.com> -- done 1 Rows. -- 10 msec.
The following statement deletes all records with <http://myopenlink.net/dataspace/Kingsley#this> as the subject:
SQL>SPARQL DELETE FROM GRAPH <http://mygraph.com> { ?s ?p ?o } FROM <http://mygraph.com> WHERE { ?s ?p ?o . filter ( ?s = <http://myopenlink.net/dataspace/Kingsley#this>) }; callret-0 VARCHAR _______________________________________________________________________________ Delete from <http://mygraph.com>, 1 triples -- done 1 Rows. -- 10 msec.
Alternatively, the statement can be written in this way:
SQL>SPARQL DELETE FROM GRAPH <http://mygraph.com> { <http://myopenlink.net/dataspace/Kingsley#this> ?p ?o } FROM <http://mygraph.com> WHERE { <http://myopenlink.net/dataspace/Kingsley#this> ?p ?o }; callret-0 VARCHAR _______________________________________________________________________________ Delete from <http://mygraph.com>, 1 triples -- done 1 Rows. -- 10 msec.
Keywords 'insert in' and 'insert into' are interchangeable in Virtuoso for backward compatibility, but the SPARUL specification lists only 'insert into'. For example, the statements below are equivalent:
SQL>SPARQL INSERT INTO GRAPH <http://mygraph.com> { <http://myopenlink.net/dataspace/Kingsley#this> <http://rdfs.org/sioc/ns#id> <Kingsley> }; callret-0 VARCHAR ______________________________________________________________________________ Insert into <http://mygraph.com>, 1 triples -- done 1 Rows. -- 0 msec. SQL>SPARQL INSERT INTO GRAPH <http://mygraph.com> { <http://myopenlink.net/dataspace/Caroline#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#User> }; callret-0 VARCHAR _______________________________________________________________________________ Insert into <http://mygraph.com>, 1 triples -- done 1 Rows. -- 0 msec. -- and SQL>SPARQL INSERT IN GRAPH <http://mygraph.com> { <http://myopenlink.net/dataspace/Kingsley#this> <http://rdfs.org/sioc/ns#id> <Kingsley> }; callret-0 VARCHAR _______________________________________________________________________________ Insert into <http://mygraph.com>, 1 triples -- done 1 Rows. -- 10 msec. SQL>SPARQL INSERT IN GRAPH <http://mygraph.com> { <http://myopenlink.net/dataspace/Caroline#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#User> }; callret-0 VARCHAR ________________________________________________________________________ Insert into <http://mygraph.com>, 1 triples -- done 1 Rows. -- 0 msec.
It is possible to use various expressions to calculate fields of new triples. This is very convenient, even if not a part of the original specification.
SQL>SPARQL INSERT INTO GRAPH <http://mygraph.com> { ?s <http://rdfs.org/sioc/ns#id> `iri (bif:concat (str (?o), "Idehen"))` } WHERE { ?s <http://rdfs.org/sioc/ns#id> ?o }; callret-0 VARCHAR _______________________________________________________________________________ Insert into <http://mygraph.com>, 4 triples -- done 1 Rows. -- 0 msec.
The example shows how to find which predicate/object pairs the following subjects have in common and count the occurances:
http://dbpedia.org/resource/Climate_change http://dbpedia.org/resource/Disaster_risk_reduction http://dbpedia.org/resource/Tanzania http://dbpedia.org/resource/Capacity_building http://dbpedia.org/resource/Poverty http://dbpedia.org/resource/Construction http://dbpedia.org/resource/Vulnerability http://dbpedia.org/resource/Mount_Kilimanjaro http://dbpedia.org/resource/Social_vulnerability
The following query returns the desired results:
SPARQL SELECT ?s1 ?s2 COUNT (1) WHERE { ?s1 ?p ?o . FILTER (?s1 IN (<http://dbpedia.org/resource/Climate_change>, <http://dbpedia.org/resource/Disaster_risk_reduction>, <http://dbpedia.org/resource/Tanzania>, <http://dbpedia.org/resource/Capacity_building>, <http://dbpedia.org/resource/Poverty>, <http://dbpedia.org/resource/Construction>, <http://dbpedia.org/resource/Vulnerability>, <http://dbpedia.org/resource/Mount_Kilimanjaro>, <http://dbpedia.org/resource/Social_vulnerability> )) ?s2 ?p ?o . FILTER (?s2 IN (<http://dbpedia.org/resource/Climate_change>, <http://dbpedia.org/resource/Disaster_risk_reduction>, <http://dbpedia.org/resource/Tanzania>, <http://dbpedia.org/resource/Capacity_building>, <http://dbpedia.org/resource/Poverty>, <http://dbpedia.org/resource/Construction>, <http://dbpedia.org/resource/Vulnerability>, <http://dbpedia.org/resource/Mount_Kilimanjaro>, <http://dbpedia.org/resource/Social_vulnerability> )) FILTER (?s1 != ?s2) FILTER (str(?s1) < str (?s2)) } LIMIT 20
The result of executing the query:
s1 s2 callret-2 http://dbpedia.org/resource/Climate_change http://dbpedia.org/resource/Tanzania 2 http://dbpedia.org/resource/Social_vulnerability http://dbpedia.org/resource/Vulnerability 1 http://dbpedia.org/resource/Mount_Kilimanjaro http://dbpedia.org/resource/Poverty 1 http://dbpedia.org/resource/Mount_Kilimanjaro http://dbpedia.org/resource/Tanzania 3 http://dbpedia.org/resource/Capacity_building http://dbpedia.org/resource/Disaster_risk_reduction 1 http://dbpedia.org/resource/Poverty http://dbpedia.org/resource/Tanzania 1
You can also find live demo query results here
'Modify graph' may be used as a form of 'update' operation.
SQL>SPARQL MODIFY GRAPH <http://mygraph.com> DELETE { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o } INSERT { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type1> ?o } WHERE { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o }; SQL>SPARQL DELETE FROM GRAPH <http://mygraph.com> { <http://myopenlink.net/dataspace/Caroline#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type1> <http://rdfs.org/sioc/ns#User> };
The RDF information resource URI can be generated via a string expression.
<http://www.openlinksw.com/dataspace/kidehen@openlinksw.com#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#User> . <http://www.openlinksw.com/dataspace/kidehen@openlinksw.com#this> <http://www.w3.org/2000/01/rdf-schema#label> "Kingsley" . <http://www.openlinksw.com/dataspace/kidehen@openlinksw.com#this> <http://rdfs.org/sioc/ns#creator_of> <http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1300> .
![]() |
Figure: 14.3.2.3.8.1. Generating RDF information resource URI |
SQL>SPARQL CLEAR GRAPH <http://mygraph.com>; callret-0 VARCHAR _____________________________________________________________________ Clear <http://mygraph.com> -- done 1 Rows. -- 10 msec.
SQL>SPARQL load bif:concat ("http://", bif:registry_get("URIQADefaultHost"), "/DAV/n3_collection/kidehen.n3") INTO GRAPH <http://mygraph.com>; callret-0 VARCHAR _______________________________________________________________________________ Load <http://localhost:8890/DAV/n3_collection/kidehen.n3> into graph <http://mygraph.com> -- done 1 Rows. -- 30 msec.
SQL>SPARQL SELECT * FROM <http://mygraph.com> WHERE { ?s ?p ?o } ; s p o VARCHAR VARCHAR VARCHAR _______________________________________________________________________________ http://www.openlinksw.com/dataspace/kidehen@openlinksw.com#this http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://rdfs.org/sioc/ns#User http://www.openlinksw.com/dataspace/kidehen@openlinksw.com#this http://rdfs.org/sioc/ns#creator_of http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1300 http://www.openlinksw.com/dataspace/kidehen@openlinksw.com#this http://www.w3.org/2000/01/rdf-schema#label Kingsley 3 Rows. -- 10 msec.
Several operations can be sent to a web service endpoint as a single statement and executed in sequence.
SQL>SPARQL INSERT IN GRAPH <http://mygraph.com> { <http://myopenlink.net/dataspace/Kingsley#this> <http://rdfs.org/sioc/ns#id> <Kingsley> } INSERT INTO GRAPH <http://mygraph.com> { <http://myopenlink.net/dataspace/Caroline#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#User> } INSERT INTO GRAPH <http://mygraph.com> { ?s <http://rdfs.org/sioc/ns#id> `iri (bif:concat (str (?o), "Idehen"))` } WHERE { ?s <http://rdfs.org/sioc/ns#id> ?o }; callret-0 VARCHAR _______________________________________________________________________________ Insert into <http://mygraph.com>, 1 triples -- done Insert into <http://mygraph.com>, 1 triples -- done Insert into <http://mygraph.com>, 8 triples -- done Commit -- done 1 Rows. -- 10 msec. SQL>SPARQL MODIFY GRAPH <http://mygraph.com> DELETE { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o } INSERT { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type1> ?o } WHERE { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o }; SQL>DELETE FROM GRAPH <http://mygraph.com> { <http://myopenlink.net/dataspace/Caroline#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type1> <http://rdfs.org/sioc/ns#User> }; SQL>SPARQL load bif:concat ("http://", bif:registry_get("URIQADefaultHost"), "/DAV/n3_collection/kidehen.n3") INTO GRAPH <http://mygraph.com>;
When handling very large RDF data collections (e.g. 600 million triples ) loaded into Virtuoso server as a single graph, the fastest operation to drop the graph is:
SQL>SPARQL CLEAR GRAPH <http://mygraph.com>; callret-0 VARCHAR ______________________________________________________________________________ Clear <http://mygraph.com> -- done 1 Rows. -- 10 msec.
The operation can be speeded up by executing log_enable (0) or even log_enable (2) beforehand, and log_enable(1) after it completes.
The procedure below keeps simple cases of graphs with bnodes:
-- Fast Approximate RDF Graph Equivalence Test -- (C) 2009 OpenLink Software -- License: GNU General Public License (only version 2 of the license). -- No warranty, even implied warranty -- This compares the content of triple dictionaries \c dict1 and \c dict2, -- returns NULL if no difference found (with bnode equivalence in mind), -- returns description of a difference otherwise. -- The function is experimental (note suffix _EXP), so no accurate QA is made. -- Some version of the function may be inserted later in OpenLink Virtuoso Server under some different name. create function DB.DBA.RDF_TRIPLE_DICTS_DIFFER_EXP ( in dict1 any, --- Triple dictionary, traditional, (vectors of S, P, O are keys, any non-nulls are values) in dict2 any, --- Second triple dictionary, like to \c dict1 in accuracy integer, --- Accuracy, 0 if no bnodes expected, 1 if "convenient" trees with intermediate bnodes expected, 2 and more are not yet implemented in equiv_map any := null, --- If specified then it contain mapping from IRI_IDs of bnodes of \c dict1 to equivalent IRI_IDs of bnodes of \c dict1. -- It can be extended during the run so use dict_duplicate() before call if needed. in equiv_rev any := null --- If specified then it is an inverted dictionary of \c equiv_map (this time \c dict2 bnodes are keys and \c dict1 bnodes are values) ) { declare dict_size1, dict_size2 integer; declare old_dirt_level, dirt_level integer; declare ctr, tailctr, sp_made_new_equiv integer; declare array1, array2, dict2_sp, dict1_op, dict2_op, array1_op any; dict_size1 := dict_size (dict1); dict_size2 := dict_size (dict2); dict2 := dict_duplicate (dict2); if (dict_size1 <> dict_size2) return 'Sizes differ'; if (equiv_map is null) { equiv_map := dict_new (dict_size1); equiv_rev := dict_new (dict_size1); } old_dirt_level := dict_size1 - dict_size (equiv_map); array1 := dict_list_keys (dict1, 0); next_loop: -- Step 1: removing triples with all three items matched ctr := dict_size1-1; while (ctr >= 0) { declare s_in_1, o_in_1, s_in_2, o_in_2, triple_in_2 any; s_in_1 := array1[ctr][0]; o_in_1 := array1[ctr][2]; if (is_bnode_iri_id (s_in_1)) { s_in_2 := dict_get (equiv_map, s_in_1, null); if (s_in_2 is null) goto next_full_eq_check; } else s_in_2 := s_in_1; if (is_bnode_iri_id (o_in_1)) { o_in_2 := dict_get (equiv_map, o_in_1, null); if (o_in_2 is null) goto next_full_eq_check; } else o_in_2 := o_in_1; triple_in_2 := vector (s_in_2, array1[ctr][1], o_in_2); if (dict_get (dict2, triple_in_2, null) is null) return vector (array1[ctr], ' is in first, ', triple_in_2, ' is missing in second'); dict_remove (dict2, triple_in_2); if (ctr < dict_size1-1) array1[ctr] := array1[dict_size1-1]; dict_size1 := dict_size1-1; next_full_eq_check: ctr := ctr-1; } -- Step 1 end, garbage truncated: if ((0 = dict_size1) or (0 = accuracy)) return null; if (dict_size1 < length (array1)) array1 := subseq (array1, 0, dict_size1); if (dict_size (dict2) <> dict_size1) signal ('OBLOM', 'Internal error: sizes of graphs suddenly differ'); -- Step 2: establishing equivs between not-yet-coupled bnodes that are values of functional predicates of coupled subjects sp_made_new_equiv := 0; dict2_sp := dict_new (dict_size1); array2 := dict_list_keys (dict2, 0); for (ctr := dict_size1-1; ctr >= 0; ctr := ctr-1) { declare sp2, o2, prev_uniq_o2 any; sp2 := vector (array2[ctr][0], array2[ctr][1]); prev_uniq_o2 := dict_get (dict2_sp, sp2, null); if (prev_uniq_o2 is null) { o2 := array2[ctr][2]; if (is_bnode_iri_id (o2)) dict_put (dict2_sp, sp2, o2); else dict_put (dict2_sp, sp2, #i0); } else if (prev_uniq_o2 <> #i0) dict_put (dict2_sp, sp2, #i0); } rowvector_subj_sort (array1, 0, 1); rowvector_subj_sort (array1, 1, 1); rowvector_subj_sort (array2, 1, 1); ctr := 0; while (ctr < dict_size1) { declare s_in_1, o_in_1, s_in_2, o_in_2, o_in_dict2_sp, o_in_dict2_sp_in_1 any; tailctr := ctr+1; if (array1[ctr][1] <> array2[ctr][1]) { if (array1[ctr][1] > array2[ctr][1]) return vector ('Cardinality of predicate ', array2[ctr][1], ' is greater in second than in first'); else return vector ('Cardinality of predicate ', array1[ctr][1], ' is greater in first than in second'); } while ((tailctr < dict_size1) and (array1[tailctr][0] = array1[ctr][0]) and (array1[tailctr][1] = array1[ctr][1]) ) tailctr := tailctr+1; if ((tailctr - ctr) > 1) goto next_sp_check; o_in_1 := array1[ctr][2]; if (not is_bnode_iri_id (o_in_1)) goto next_sp_check; o_in_2 := dict_get (equiv_map, o_in_1, null); if (o_in_2 is not null) goto next_sp_check; s_in_1 := array1[ctr][0]; if (is_bnode_iri_id (s_in_1)) { s_in_2 := dict_get (equiv_map, s_in_1, null); if (s_in_2 is null) goto next_sp_check; } else s_in_2 := s_in_1; o_in_dict2_sp := dict_get (dict2_sp, vector (s_in_2, array1[ctr][1]), null); if (o_in_dict2_sp is null) return vector (vector (s_in_1, array1[ctr][1], o_in_1), ' is unique SP in first, ', vector (s_in_2, array1[ctr][1]), ' is missing SP in second'); if (o_in_dict2_sp = #i0) return vector (vector (s_in_1, array1[ctr][1], o_in_1), ' is unique SP in first, ', vector (s_in_2, array1[ctr][1]), ' is not unique SP-to-bnode in second'); o_in_dict2_sp_in_1 := dict_get (equiv_rev, o_in_dict2_sp, null); if (o_in_dict2_sp_in_1 is not null) { if (o_in_dict2_sp_in_1 = o_in_1) goto next_sp_check; return vector (vector (s_in_1, array1[ctr][1], o_in_1), ' is unique SP in first, ', vector (s_in_2, array1[ctr][1], o_in_dict2_sp), ' is unique SP in second but ', o_in_dict2_sp, ' rev-equiv to ', o_in_dict2_sp_in_1); } dict_put (equiv_map, o_in_1, o_in_dict2_sp); dict_put (equiv_rev, o_in_dict2_sp, o_in_1); sp_made_new_equiv := sp_made_new_equiv + 1; next_sp_check: ctr := tailctr; } dict_list_keys (dict2_sp, 2); -- Step 2 end if (sp_made_new_equiv * 10 > dict_size1) goto next_loop; -- If dictionary is noticeably extended then it's worth to remove more triples before continue. -- Step 3: establishing equivs between not-yet-coupled bnodes that are subjects of inverse functional properties with coupled objects. dict1_op := dict_new (dict_size1); for (ctr := dict_size1-1; ctr >= 0; ctr := ctr-1) { declare op1, s1, prev_uniq_s1 any; op1 := vector (array1[ctr][2], array1[ctr][1]); prev_uniq_s1 := dict_get (dict1_op, op1, null); if (prev_uniq_s1 is null) { s1 := array1[ctr][0]; if (is_bnode_iri_id (s1)) dict_put (dict1_op, op1, s1); else dict_put (dict1_op, op1, #i0); } else if (prev_uniq_s1 <> #i0) dict_put (dict1_op, op1, #i0); } array1_op := dict_to_vector (dict1_op, 2); dict2_op := dict_new (dict_size1); for (ctr := dict_size1-1; ctr >= 0; ctr := ctr-1) { declare op2, s2, prev_uniq_s2 any; op2 := vector (array2[ctr][2], array2[ctr][1]); prev_uniq_s2 := dict_get (dict2_op, op2, null); if (prev_uniq_s2 is null) { s2 := array2[ctr][0]; if (is_bnode_iri_id (s2)) dict_put (dict2_op, op2, s2); else dict_put (dict2_op, op2, #i0); } else if (prev_uniq_s2 <> #i0) dict_put (dict2_op, op2, #i0); } ctr := length (array1_op) - 2; while (ctr >= 0) { declare o_in_1, s_in_1, o_in_2, s_in_2, s_in_dict2_op, s_in_dict2_op_in_1 any; s_in_1 := array1_op[ctr+1]; if (not is_bnode_iri_id (s_in_1)) goto next_op_check; s_in_2 := dict_get (equiv_map, s_in_1, null); if (s_in_2 is not null) goto next_op_check; o_in_1 := array1_op[ctr][0]; if (is_bnode_iri_id (o_in_1)) { o_in_2 := dict_get (equiv_map, o_in_1, null); if (o_in_2 is null) goto next_op_check; } else o_in_2 := o_in_1; s_in_dict2_op := dict_get (dict2_op, vector (o_in_2, array1_op[ctr][1]), null); if (s_in_dict2_op is null) return vector (vector (s_in_1, array1_op[ctr][1], o_in_1), ' is unique OP in first, ', vector (o_in_2, array1_op[ctr][1]), ' is missing OP in second'); if (s_in_dict2_op = #i0) return vector (vector (s_in_1, array1_op[ctr][1], o_in_1), ' is unique OP in first, ', vector (o_in_2, array1_op[ctr][1]), ' is not unique OP-to-bnode in second'); s_in_dict2_op_in_1 := dict_get (equiv_rev, s_in_dict2_op, null); if (s_in_dict2_op_in_1 is not null) { if (s_in_dict2_op_in_1 = s_in_1) goto next_op_check; return vector (vector (s_in_1, array1_op[ctr][1], o_in_1), ' is unique OP in first, ', vector (s_in_dict2_op, array1[ctr][1], o_in_2), ' is unique OP in second but ', s_in_dict2_op, ' rev-equiv to ', s_in_dict2_op_in_1); } dict_put (equiv_map, s_in_1, s_in_dict2_op); dict_put (equiv_rev, s_in_dict2_op, s_in_1); next_op_check: ctr := ctr - 2; } dict_list_keys (dict2_op, 2); -- Step 3 end dirt_level := dict_size1 - dict_size (equiv_map); if (dirt_level >= old_dirt_level) return vector (vector (array1[0][0], array1[0][1], array1[0][2]), ' has no matches in second with the requested accuracy'); old_dirt_level := dirt_level; goto next_loop; } ; create function DB.DBA.RDF_GRAPHS_DIFFER_EXP (in g1_uri varchar, in g2_uri varchar, in accuracy integer) { return DB.DBA.RDF_TRIPLE_DICTS_DIFFER_EXP ( (sparql define output:valmode "LONG" construct { ?s ?p ?o } where { graph `iri(?:g1_uri)` { ?s ?p ?o }}), (sparql define output:valmode "LONG" construct { ?s ?p ?o } where { graph `iri(?:g2_uri)` { ?s ?p ?o }}), accuracy ); } ; -- The rest of file contains some minimal tests. set verbose off; set banner off; set types off; create function DB.DBA.DICT_EXTEND_WITH_KEYS (in dict any, in keys any) { if (dict is null) dict := dict_new (length (keys)); foreach (any k in keys) do dict_put (dict, k, 1); return dict; } ; create function DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP (in title varchar, in should_differ integer, in v1 any, in v2 any, in accuracy integer) { declare d1, d2, eqm, eqr, differ_status any; d1 := DB.DBA.DICT_EXTEND_WITH_KEYS (null, v1); d2 := DB.DBA.DICT_EXTEND_WITH_KEYS (null, v2); eqm := dict_new (10); eqr := dict_new (10); dbg_obj_princ ('===== ' || title); differ_status := DB.DBA.RDF_TRIPLE_DICTS_DIFFER_EXP (d1, d2, accuracy, eqm, eqr); dbg_obj_princ ('Result: ', differ_status); if (0 < dict_size (eqm)) dbg_obj_princ ('Equivalence map: ', dict_to_vector (eqm, 0)); dbg_obj_princ ('Equivalence rev: ', dict_to_vector (eqr, 0)); return sprintf ('%s: %s', case when (case when should_differ then equ (0, isnull (differ_status)) else isnull (differ_status) end) then 'PASSED' else '***FAILED' end, title ); } ; create function DB.DBA.TEST_RDF_GRAPHS_DIFFER_EXP (in title varchar, in should_differ integer, in g1_uri varchar, in g2_uri varchar, in accuracy integer) { declare differ_status any; differ_status := DB.DBA.RDF_GRAPHS_DIFFER_EXP (g1_uri, g2_uri, accuracy); dbg_obj_princ ('Result: ', differ_status); return sprintf ('%s: %s', case when (case when should_differ then equ (0, isnull (differ_status)) else isnull (differ_status) end) then 'PASSED' else '***FAILED' end, title ); } ; select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'Identical graphs', 0, vector ( vector (#i100, #i200, #i300), vector (#i100, #i200, 1) ), vector ( vector (#i100, #i200, #i300), vector (#i100, #i200, 1) ), 100 ); select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'Sizes differ', 1, vector ( vector (#i100, #i200, #i300), vector (#i100, #i200, 1) ), vector ( vector (#i100, #i200, #i300), vector (#i100, #i200, 1), vector (#i101, #i201, #i301) ), 100 ); select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'Cardinality of a pred differ', 1, vector ( vector (#i100, #i200, #ib300), vector (#i101, #i200, #ib302), vector (#i103, #i201, #ib304), vector (#ib109, #i200, #ib109) ), vector ( vector (#i100, #i200, #ib301), vector (#i101, #i200, #ib303), vector (#i103, #i201, #ib305), vector (#ib109, #i201, #ib109) ), 100 ); select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'Bnodes in O with unique SP (equiv)', 0, vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib301), vector (#i101, #i201, #ib301), vector (#i102, #i202, #ib303), vector (#ib303, #i204, #i306), vector (#ib303, #i205, #ib305), vector (#i100, #i200, 1) ), vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib302), vector (#i101, #i201, #ib302), vector (#i102, #i202, #ib304), vector (#ib304, #i204, #i306), vector (#ib304, #i205, #ib306), vector (#i100, #i200, 1) ), 100 ); select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'Bnodes in O with unique SP (diff 1)', 1, vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib301), vector (#i102, #i202, #ib303), vector (#ib303, #i204, #i306), vector (#ib303, #i205, #ib305), vector (#i100, #i200, 1) ), vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib302), vector (#i102, #i202, #ib304), vector (#ib304, #i204, #i306), vector (#ib304, #i205, #i306), vector (#i100, #i200, 1) ), 100 ); select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'Bnodes in O with unique SP (diff 2)', 1, vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib301), vector (#i102, #i202, #ib303), vector (#ib303, #i204, #i306), vector (#ib303, #i205, #ib305), vector (#i100, #i200, 1) ), vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib302), vector (#i102, #i202, #ib304), vector (#ib304, #i204, #i306), vector (#ib304, #i205, #ib304), vector (#i100, #i200, 1) ), 100 ); select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'foaf-like-mix (equiv)', 0, vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib301), vector (#i100, #i201, #ib303), vector (#i100, #i201, #ib305), vector (#i100, #i201, #ib307), vector (#ib301, #i202, 'Anna'), vector (#ib303, #i202, 'Anna'), vector (#ib305, #i202, 'Brigit'), vector (#ib307, #i202, 'Clara'), vector (#ib301, #i203, 'ann@ex.com'), vector (#ib303, #i203, 'ann@am.com'), vector (#ib305, #i203, 'root@ple.com'), vector (#ib307, #i203, 'root@ple.com') ), vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib302), vector (#i100, #i201, #ib304), vector (#i100, #i201, #ib306), vector (#i100, #i201, #ib308), vector (#ib302, #i202, 'Anna'), vector (#ib304, #i202, 'Anna'), vector (#ib306, #i202, 'Brigit'), vector (#ib308, #i202, 'Clara'), vector (#ib302, #i203, 'ann@ex.com'), vector (#ib304, #i203, 'ann@am.com'), vector (#ib306, #i203, 'root@ple.com'), vector (#ib308, #i203, 'root@ple.com') ), 100 ); select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'foaf-like-mix (swapped names)', 1, vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib301), vector (#i100, #i201, #ib303), vector (#i100, #i201, #ib305), vector (#i100, #i201, #ib307), vector (#ib301, #i202, 'Anna'), vector (#ib303, #i202, 'Anna'), vector (#ib305, #i202, 'Brigit'), vector (#ib307, #i202, 'Clara'), vector (#ib301, #i203, 'ann@ex.com'), vector (#ib303, #i203, 'ann@am.com'), vector (#ib305, #i203, 'root@ple.com'), vector (#ib307, #i203, 'root@ple.com') ), vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib302), vector (#i100, #i201, #ib304), vector (#i100, #i201, #ib306), vector (#i100, #i201, #ib308), vector (#ib302, #i202, 'Anna'), vector (#ib304, #i202, 'Brigit'), vector (#ib306, #i202, 'Anna'), vector (#ib308, #i202, 'Clara'), vector (#ib302, #i203, 'ann@ex.com'), vector (#ib304, #i203, 'ann@am.com'), vector (#ib306, #i203, 'root@ple.com'), vector (#ib308, #i203, 'root@ple.com') ), 100 ); select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'foaf-like-mix (swapped names)', 1, vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib301), vector (#i100, #i201, #ib303), vector (#i100, #i201, #ib305), vector (#i100, #i201, #ib307), vector (#ib301, #i202, 'Anna'), vector (#ib303, #i202, 'Anna'), vector (#ib305, #i202, 'Brigit'), vector (#ib307, #i202, 'Clara'), vector (#ib301, #i203, 'ann@ex.com'), vector (#ib303, #i203, 'ann@am.com'), vector (#ib305, #i203, 'root@ple.com'), vector (#ib307, #i203, 'root@ple.com') ), vector ( vector (#i100, #i200, #i300), vector (#i100, #i201, #ib302), vector (#i100, #i201, #ib304), vector (#i100, #i201, #ib306), vector (#i100, #i201, #ib308), vector (#ib302, #i202, 'Anna'), vector (#ib304, #i202, 'Brigit'), vector (#ib306, #i202, 'Anna'), vector (#ib308, #i202, 'Clara'), vector (#ib302, #i203, 'ann@ex.com'), vector (#ib304, #i203, 'ann@am.com'), vector (#ib306, #i203, 'root@ple.com'), vector (#ib308, #i203, 'root@ple.com') ), 100 ); select DB.DBA.TEST_RDF_TRIPLE_DICTS_DIFFER_EXP ( 'bnodes only (equiv that can not be proven)', 1, vector ( vector (#ib101, #i200, #ib103), vector (#ib103, #i201, #ib101) ), vector ( vector (#ib102, #i200, #ib104), vector (#ib104, #i201, #ib102) ), 100 ); sparql clear graph <http://GraphCmp/One>; TTLP ('@prefix foaf: <http://i-dont-remember-it> . _:me a foaf:Person ; foaf:knows [ foaf:nick "oerling" ; foaf:title "Mr." ; foaf:sha1 "abra" ] ; foaf:knows [ foaf:nick "kidehen" ; foaf:title "Mr." ; foaf:sha1 "bra" ] ; foaf:knows [ foaf:nick "aldo" ; foaf:title "Mr." ; foaf:sha1 "cada" ] .', '', 'http://GraphCmp/One' ); sparql clear graph <http://GraphCmp/Two>; TTLP ('@prefix foaf: <http://i-dont-remember-it> . _:iv foaf:knows [ foaf:title "Mr." ; foaf:sha1 "cada" ; foaf:nick "aldo" ] ; foaf:knows [ foaf:sha1 "bra" ; foaf:title "Mr." ; foaf:nick "kidehen" ] ; foaf:knows [ foaf:nick "oerling" ; foaf:sha1 "abra" ; foaf:title "Mr." ] ; a foaf:Person .', '', 'http://GraphCmp/Two' ); select DB.DBA.TEST_RDF_GRAPHS_DIFFER_EXP ( 'nonexisting graphs (equiv, of course)', 0, 'http://GraphCmp/NoSuch', 'http://GraphCmp/NoSuch', 100 ); select DB.DBA.TEST_RDF_GRAPHS_DIFFER_EXP ( 'throughout test on foafs (equiv)', 0, 'http://GraphCmp/One', 'http://GraphCmp/Two', 100 );
SQL>SPARQL INSERT INTO GRAPH <http://BookStore.com> { <http://www.dajobe.org/foaf.rdf#i> <http://purl.org/dc/elements/1.1/title> "SPARQL and RDF" . <http://www.dajobe.org/foaf.rdf#i> <http://purl.org/dc/elements/1.1/date> <1999-01-01T00:00:00>. <http://www.w3.org/People/Berners-Lee/card#i> <http://purl.org/dc/elements/1.1/title> "Design notes" . <http://www.w3.org/People/Berners-Lee/card#i> <http://purl.org/dc/elements/1.1/date> <2001-01-01T00:00:00>. <http://www.w3.org/People/Connolly/#me> <http://purl.org/dc/elements/1.1/title> "Fundamentals of Compiler Design" . <http://www.w3.org/People/Connolly/#me> <http://purl.org/dc/elements/1.1/date> <2002-01-01T00:00:00>. }; callret-0 VARCHAR _________________________________________________________________ Insert into <http://BookStore.com>, 6 triples -- done 1 Rows. -- 0 msec.
A SPARQL/Update request that contains a triple to be deleted and a triple to be added (used here to correct a book title).
SQL>SPARQL MODIFY GRAPH <http://BookStore.com> DELETE { <http://www.w3.org/People/Connolly/#me> <http://purl.org/dc/elements/1.1/title> "Fundamentals of Compiler Design" } INSERT { <http://www.w3.org/People/Connolly/#me> <http://purl.org/dc/elements/1.1/title> "Fundamentals" }; callret-0 VARCHAR _______________________________________________________________________________ Modify <http://BookStore.com>, delete 1 and insert 1 triples -- done 1 Rows. -- 20 msec.
The example below has a request to delete all records of old books (dated before year 2000)
SQL>SPARQL PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> DELETE FROM GRAPH <http://BookStore.com> { ?book ?p ?v } WHERE { GRAPH <http://BookStore.com> { ?book dc:date ?date FILTER ( xsd:dateTime(?date) < xsd:dateTime("2000-01-01T00:00:00")). ?book ?p ?v. } }; _______________________________________________________________________________ Delete from <http://BookStore.com>, 6 triples -- done 1 Rows. -- 10 msec.
The next snippet copies records from one named graph to another based on a pattern:
SQL>SPARQL clear graph <http://BookStore.com>; SQL>SPARQL clear graph <http://NewBookStore.com>; SQL>SPARQL insert in graph <http://BookStore.com> { <http://www.dajobe.org/foaf.rdf#i> <http://purl.org/dc/elements/1.1/date> <1999-04-01T00:00:00> . <http://www.w3.org/People/Berners-Lee/card#i> <http://purl.org/dc/elements/1.1/date> <1998-05-03T00:00:00> . <http://www.w3.org/People/Connolly/#me> <http://purl.org/dc/elements/1.1/date> <2001-02-08T00:00:00> }; SQL>SPARQL PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> INSERT INTO GRAPH <http://NewBookStore.com> { ?book ?p ?v } WHERE { GRAPH <http://BookStore.com> { ?book dc:date ?date FILTER ( xsd:dateTime(?date) > xsd:dateTime("2000-01-01T00:00:00")). ?book ?p ?v. } }; callret-0 VARCHAR _______________________________________________________________________________ Insert into <http://NewBookStore.com>, 6 triples -- done 1 Rows. -- 30 msec.
This example moves records from one named graph to another named graph based on a pattern:
SQL>SPARQL PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> INSERT INTO GRAPH <http://NewBookStore.com> { ?book ?p ?v } WHERE { GRAPH <http://BookStore.com> { ?book dc:date ?date . FILTER ( xsd:dateTime(?date) > xsd:dateTime("2000-01-01T00:00:00")). ?book ?p ?v. } }; _______________________________________________________________________________ Insert into <http://NewBookStore.com>, 6 triples -- done 1 Rows. -- 10 msec. SQL>SPARQL PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> DELETE FROM GRAPH <http://BookStore.com> { ?book ?p ?v } WHERE { GRAPH <http://BookStore.com> { ?book dc:date ?date . FILTER ( xsd:dateTime(?date) > xsd:dateTime("2000-01-01T00:00:00")). ?book ?p ?v. } }; _______________________________________________________________________________ Delete from <http://BookStore.com>, 3 triples -- done 1 Rows. -- 10 msec.
## All programmes related to James Bond: PREFIX po: <http://purl.org/ontology/po/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?uri ?label WHERE { ?uri po:category <http://www.bbc.co.uk/programmes/people/bmFtZS9ib25kLCBqYW1lcyAobm8gcXVhbGlmaWVyKQ#person> ; rdfs:label ?label. }
## Find all Eastenders broadcasta after 2009-01-01, ## along with the broadcast version & type PREFIX event: <http://purl.org/NET/c4dm/event.owl#> PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#> PREFIX po: <http://purl.org/ontology/po/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?version_type ?broadcast_start WHERE { <http://www.bbc.co.uk/programmes/b006m86d#programme> po:episode ?episode . ?episode po:version ?version . ?version a ?version_type . ?broadcast po:broadcast_of ?version . ?broadcast event:time ?time . ?time tl:start ?broadcast_start . FILTER ( (?version_type != <http://purl.org/ontology/po/Version>) && (?broadcast_start > "2009-01-01T00:00:00Z"^^xsd:dateTime) ) }
## Find all programmes that featured both the Foo Fighters and Al Green PREFIX po: <http://purl.org/ontology/po/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX mo: <http://purl.org/ontology/mo/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX event: <http://purl.org/NET/c4dm/event.owl#> PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#> PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT DISTINCT ?programme ?label WHERE { ?event1 po:track ?track1 . ?track1 foaf:maker ?maker1 . ?maker1 owl:sameAs <http://www.bbc.co.uk/music/artists/67f66c07-6e61-4026-ade5-7e782fad3a5d#artist> . ?event2 po:track ?track2 . ?track2 foaf:maker ?maker2 . ?maker2 owl:sameAs <http://www.bbc.co.uk/music/artists/fb7272ba-f130-4f0a-934d-6eeea4c18c9a#artist> . ?event1 event:time ?t1 . ?event2 event:time ?t2 . ?t1 tl:timeline ?tl . ?t2 tl:timeline ?tl . ?version po:time ?t . ?t tl:timeline ?tl . ?programme po:version ?version . ?programme rdfs:label ?label . }
## Get short synopsis' of EastEnders episodes PREFIX po: <http://purl.org/ontology/po/> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?t ?o WHERE { <http://www.bbc.co.uk/programmes/b006m86d#programme> po:episode ?e . ?e a po:Episode . ?e po:short_synopsis ?o . ?e dc:title ?t }
## Get short synopsis' of EastEnders episodes (with graph) PREFIX po: <http://purl.org/ontology/po/> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?g ?t ?o WHERE { graph ?g { <http://www.bbc.co.uk/programmes/b006m86d#programme> po:episode ?e . ?e a po:Episode . ?e po:short_synopsis ?o . ?e dc:title ?t } }
## Get reviews where John Paul Jones' has been involved PREFIX mo: <http://purl.org/ontology/mo/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX rev: <http://purl.org/stuff/rev#> PREFIX po: <http://purl.org/ontology/po/> SELECT DISTINCT ?r_name, ?rev WHERE { { <http://www.bbc.co.uk/music/artists/4490113a-3880-4f5b-a39b-105bfceaed04#artist> foaf:made ?r1 . ?r1 a mo:Record . ?r1 dc:title ?r_name . ?r1 rev:hasReview ?rev } UNION { <http://www.bbc.co.uk/music/artists/4490113a-3880-4f5b-a39b-105bfceaed04#artist> mo:member_of ?b1 . ?b1 foaf:made ?r1 . ?r1 a mo:Record . ?r1 dc:title ?r_name . ?r1 rev:hasReview ?rev } }
To retrieve all triples for each entity for a given list of entities uris, one might use the following syntax:
SELECT ?p ?o WHERE { ?s ?p ?o . FILTER ( ?s IN (<someGraph#entity1>, <someGraph#entity2>, ...<someGraph#entityN> ) ) }
So to demonstrate this feature, execute the following query:
SQL>SPARQL SELECT DISTINCT ?p ?o WHERE { ?s ?p ?o . FILTER ( ?s IN (<http://dbpedia.org/resource/Climate_change>, <http://dbpedia.org/resource/Social_vulnerability> ) ) } LIMIT 100 p o ANY ANY _______________________________________________________________________________ http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://s.zemanta.com/ns#Target http://s.zemanta.com/ns#title Climate change http://s.zemanta.com/ns#targetType http://s.zemanta.com/targets#rdf 3 Rows. -- 10 msec.
To find all albums looked up by album name, one might use the following syntax:
SQL>SPARQL SELECT ?s ?o ?an ( bif:search_excerpt ( bif:vector ( 'In', 'Your' ) , ?o ) ) WHERE { ?s rdf:type mo:Record . ?s foaf:maker ?a . ?a foaf:name ?an . ?s dc:title ?o . FILTER ( bif:contains ( ?o, '"in your"' ) ) } LIMIT 10; http://musicbrainz.org/music/record/30f13688-b9ca-4fa5-9430-f918e2df6fc4 China in Your Hand Fusion China <b>in</b> <b>Your</b> Hand. http://musicbrainz.org/music/record/421ad738-2582-4512-b41e-0bc541433fbc China in Your Hand T'Pau China <b>in</b> <b>Your</b> Hand. http://musicbrainz.org/music/record/01acff2a-8316-4d4b-af93-97289e164379 China in Your Hand T'Pau China <b>in</b> <b>Your</b> Hand. http://musicbrainz.org/music/record/4fe99b06-ac73-40dd-8be7-bdaefb014981 China in Your Hand T'Pau China <b>in</b> <b>Your</b> Hand. http://musicbrainz.org/music/record/ac1cb011-6040-4515-baf2-59551a9884ac In Your Hands Stella One Eleven <b>In</b> <b>Your</b> Hands. http://dbtune.org/magnatune/album/mercy-inbedinst In Your Bed - instrumental mix Mercy Machine <b>In</b> <b>Your</b> Bed mix. http://musicbrainz.org/music/record/a09ae12e-3694-4f68-bf25-f6ff4f790962 A Word in Your Ear Alfie A Word <b>in</b> <b>Your</b> Ear. http://dbtune.org/magnatune/album/mercy-inbedremix In Your Bed - the remixes Mercy Machine <b>In</b> <b>Your</b> Bed the remixes. http://musicbrainz.org/music/record/176b6626-2a25-42a7-8f1d-df98bec092b4 Smoke Gets in Your Eyes The Platters Smoke Gets <b>in</b> <b>Your</b> Eyes. http://musicbrainz.org/music/record/e617d90e-4f86-425c-ab97-efdf4a8a452b Smoke Gets in Your Eyes The Platters Smoke Gets <b>in</b> <b>Your</b> Eyes.
Note that the query will not show anything when there are triples like:
<x> <y> "In" <z> <q> "Your"
To get movies from DBpedia, where the query can contain terms from the title, one might use the following syntax:
SQL>SPARQL SELECT ?s ?an ?dn ?o( bif:search_excerpt ( bif:vector ( 'Broken', 'Flowers' ) , ?o ) ) WHERE { ?s rdf:type dbpedia-owl:Film . ?s dbpprop:name ?o . FILTER ( bif:contains ( ?o, '"broken flowers"' ) ) OPTIONAL { ?s dbpprop:starring ?starring .} OPTIONAL { ?s dbpprop:director ?director . } OPTIONAL { ?starring dbpprop:name ?an . } OPTIONAL { ?director dbpprop:name ?dn . } }; http://dbpedia.org/resource/Broken_Flowers Tilda Swinton Jim Jarmusch Broken Flowers <b>Broken</b> <b>Flowers</b>. http://dbpedia.org/resource/Broken_Flowers Swinton, Tilda Jim Jarmusch Broken Flowers <b>Broken</b> <b>Flowers</b>. .... http://dbpedia.org/resource/Broken_Flowers Bill Murray Jim Jarmusch Music from Broken Flowers Music from <b>Broken</b> <b>Flowers</b>. ....
Note that the query will not show anything when there are triples like:
<x> <y> "Broken" <z> <q> "Flowers"
This example shows usage of dateTime column truncation to date only and performs a group by on this column:
-- prepare the data by inserting triples in a graph: SQL>SPARQL INSERT INTO GRAPH <http://BookStore.com> { <http://www.dajobe.org/foaf.rdf#i> <http://purl.org/dc/elements/1.1/title> "SPARQL and RDF" . <http://www.dajobe.org/foaf.rdf#i> <http://purl.org/dc/elements/1.1/date> <1999-01-01T00:00:00>. <http://www.w3.org/People/Berners-Lee/card#i> <http://purl.org/dc/elements/1.1/title> "Design notes" . <http://www.w3.org/People/Berners-Lee/card#i> <http://purl.org/dc/elements/1.1/date> <2001-01-01T00:00:00>. <http://www.w3.org/People/Connolly/#me> <http://purl.org/dc/elements/1.1/title> "Fundamentals of Compiler Design" . <http://www.w3.org/People/Connolly/#me> <http://purl.org/dc/elements/1.1/date> <2002-01-01T00:00:00>. <http://www.ivan-herman.net/foaf.rdf#me> <http://purl.org/dc/elements/1.1/title> "RDF Store" . <http://www.ivan-herman.net/foaf.rdf#me> <http://purl.org/dc/elements/1.1/date> <2001-03-05T00:00:00>. <http://bblfish.net/people/henry/card#me> <http://purl.org/dc/elements/1.1/title> "Design RDF notes" . <http://bblfish.net/people/henry/card#me> <http://purl.org/dc/elements/1.1/date> <2001-01-01T00:00:00>. <http://hometown.aol.com/chbussler/foaf/chbussler.foaf#me> <http://purl.org/dc/elements/1.1/title> "RDF Fundamentals" . <http://hometown.aol.com/chbussler/foaf/chbussler.foaf#me> <http://purl.org/dc/elements/1.1/date> <2002-01-01T00:00:00>. }; _______________________________________________________ Insert into <http://BookStore.com>, 12 triples -- done -- Find Count of Group by Dates SQL>SPARQL SELECT (xsd:date(bif:subseq(str(?a_dt), 0, 10))), count(*) FROM <http://BookStore.com> WHERE { ?s <http://purl.org/dc/elements/1.1/date> ?a_dt } GROUP BY (xsd:date(bif:subseq(str(?a_dt), 0, 10))); callret-0 callret-1 VARCHAR VARCHAR __________________________________________________ 1999-01-01 1 2001-01-01 2 2002-01-01 2 2001-03-05 1 4 Rows. -- 15 msec. SQL>
Virtuoso extends SPARQL with expressions in results, subqueries, aggregates and grouping. These extensions allow a straightforward translation of arbitrary SQL queries to SPARQL. This extension is called "SPARQL BI", because the primary objective is to match needs of Business Intelligence. The extended features apply equally to querying physical quads or relational tables mapped through RDF views.
In this section, many examples use the TPC-H namespace. You may test them on your local demo database. They use data from the TPC-H dataset that is mapped into a graph with an IRI of the form http://example.com/tpch. When testing, you should replace the fake host name "example.com" with the host name of your own installation verbatim, that is as specified in the "DefaultHost" parameter in the [URIQA] section of the Virtuoso configuration file.
Virtuoso extends SPARQL with SQL like aggregate and "group by" functionality. This functionality is also available by embedding SPARQL text inside SQL, but the SPARQL extension syntax has the benefit of also working over the SPARQL protocol and of looking more SPARQL-like.
The supported aggregates are COUNT, MIN, MAX, AVG and SUM. These can take an optional DISTINCT keyword. These are permitted only in the selection part of a select query. If a selection list consists of a mix of variables and aggregates, the non-aggregate selected items are considered to be grouping columns and a GROUP BY over them is implicitly added at the end of the generated SQL query. Virtuoso also supports explicit syntax for GROUP BY, ORDER BY, LIMIT and OFFSET. There is no explicit syntax for HAVING in Virtuoso SPARQL.
If a selection consists of aggregates exclusively, the result set has one row with the values of the aggregates. If there are aggregates and variables in the selection, the result set has as many rows as there are distinct combinations of the variables; the aggregates are then calculated over each such distinct combination, as if there were a SQL GROUP BY over all non-aggregates. The implicit grouping pays attention to all subexpressions in the return list; say, if a result column expression is (?x * max (?y)) then ?y is aggregated and ?x is not so it is grouped by ?x. This also means that if a result column expression is (bif:year (?shipdate)) then a group is made for each distinct ?shipdate, i.e. up to 366 groups for each distinct year. If you need one group per year, write explicit GROUP BY (bif:year (?shipdate)).
With the count aggregate the argument may be either *, meaning counting all rows, or a variable name, meaning counting all the rows where this variable is bound. If there is no implicit GROUP BY, there can be an optional DISTINCT keyword before the variable that is the argument of an aggregate.
There is a special syntax for counting distinct combinations of selected variables. This is:
SELECT COUNT DISTINCT ?v1 ... ?vn FROM ....
User-defined aggregate functions are not supported in current version of the SPARQL compiler.
Virtuoso has support for paths consisting of dereferencing properties in SPARQL. Virtuoso allows simple paths in expressions and has a separate feature for transitivity:
If this property is set (for example by an RDF View) then +> should be used.
Simple Example
SELECT ?f+>foaf:name ?f|>foaf:mbox WHERE { ?x foaf:name "Alice" . ?x foaf:knows ?f . FILTER (?f+>foaf:name = "John") }
means:
SELECT ?fname ?mbox WHERE { ?x foaf:knows ?f . ?x foaf:knows ?f . OPTIONAL {?f foaf:mbox ?mbox} . ?f foaf:name ?fname . ?x foaf:name "Alice" . ?x foaf:knows ?f2 . ?f2 foaf:name "John" . }
Other Examples
SPARQL DEFINE sql:signal-void-variables 1 PREFIX tpcd: <http://www.openlinksw.com/schemas/tpcd#> PREFIX oplsioc: <http://www.openlinksw.com/schemas/oplsioc#> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?l+>tpcd:returnflag, ?l+>tpcd:linestatus, sum(?l+>tpcd:linequantity) as ?sum_qty, sum(?l+>tpcd:lineextendedprice) as ?sum_base_price, sum(?l+>tpcd:lineextendedprice*(1 - ?l+>tpcd:linediscount)) as ?sum_disc_price, sum(?l+>tpcd:lineextendedprice*(1 - ?l+>tpcd:linediscount)*(1+?l+>tpcd:linetax)) as ?sum_charge, avg(?l+>tpcd:linequantity) as ?avg_qty, avg(?l+>tpcd:lineextendedprice) as ?avg_price, avg(?l+>tpcd:linediscount) as ?avg_disc, count(1) as ?count_order FROM <http://example.com/tpcd> WHERE { ?l a tpcd:lineitem . FILTER (?l+>tpcd:shipdate <= bif:dateadd ("day", -90, '1998-12-01'^^xsd:date)) } ORDER BY ?l+>tpcd:returnflag ?l+>tpcd:linestatus
SPARQL DEFINE sql:signal-void-variables 1 PREFIX tpcd: <http://www.openlinksw.com/schemas/tpcd#> SELECT ?supp+>tpcd:acctbal, ?supp+>tpcd:name, ?supp+>tpcd:has_nation+>tpcd:name as ?nation_name, ?part+>tpcd:partkey, ?part+>tpcd:mfgr, ?supp+>tpcd:address, ?supp+>tpcd:phone, ?supp+>tpcd:comment FROM <http://example.com/tpcd> WHERE { ?ps a tpcd:partsupp; tpcd:has_supplier ?supp; tpcd:has_part ?part . ?supp+>tpcd:has_nation+>tpcd:has_region tpcd:name 'EUROPE' . ?part tpcd:size 15 . ?ps tpcd:supplycost ?minsc . { SELECT ?part min(?ps+>tpcd:supplycost) as ?minsc WHERE { ?ps a tpcd:partsupp; tpcd:has_part ?part; tpcd:has_supplier ?ms . ?ms+>tpcd:has_nation+>tpcd:has_region tpcd:name 'EUROPE' . } } FILTER (?part+>tpcd:type like '%BRASS') } ORDER BY desc (?supp+>tpcd:acctbal) ?supp+>tpcd:has_nation+>tpcd:name ?supp+>tpcd:name ?part+>tpcd:partkey
SPARQL SELECT COUNT (*) FROM <http://mygraph.com> WHERE {?s ?p ?o}
Example for count of O's for each distinct P
SPARQL define input:inference "http://mygraph.com" SELECT ?p COUNT (?o) FROM <http://mygraph.com> WHERE {?s ?p ?o}
SPARQL define input:inference "http://mygraph.com" SELECT COUNT (?p) COUNT (?o) COUNT (DISTINCT ?o) FROM <http://mygraph.com> WHERE {?s ?p ?o}
SPARQL define input:inference "http://mygraph.com" SELECT count distinct ?s ?p ?o FROM <http://mygraph.com> WHERE {?s ?p ?o}
SPARQL prefix tpch: <http://www.openlinksw.com/schemas/tpch#> SELECT ?status count(*) sum(?extendedprice) FROM <http://localhost.localdomain:8310/tpch> WHERE { ?l a tpch:lineitem ; tpch:lineextendedprice ?extendedprice ; tpch:linestatus ?status . }
Example: A dataset of people, some duplicated
Suppose there is a dataset with many people, some of them sharing the same name. To list them we would, ideally, execute the query:
SPARQL SELECT DISTINCT (?name) ?person ?mail WHERE { ?person rdf:type foaf:Person . ?person foaf:name ?name . ?person foaf:mbox_sha1sum ?mail }
Unfortunately, the facility to apply DISTINCT to a part of the result set row (i.e. to ?name) does not currently exist. (Although the above form is permitted, it's interpreted as being identical to 'SELECT DISTINCT ?name, ?person, ?mail WHERE ...') If there's demand for such a feature then we may introduce an aggregate called, say, SPECIMEN, that will return the very first of the aggregated values. e.g.:
SPARQL SELECT ?name (specimen(?person)) (specimen(?mail)) WHERE { ?person rdf:type foaf:Person . ?person foaf:name ?name . ?person foaf:mbox_sha1sum ?mail }
As a workaround to this limitation, the MIN aggregate can be used, provided duplicates are few and there's no requirement that ?person should correspond to ?mail (i.e. the result should contain some person node and some mail node but they don't have to be connected by foaf:mbox_sha1sum):
SPARQL SELECT ?name (min(?person)) (min(?mail)) WHERE { ?person rdf:type foaf:Person . ?person foaf:name ?name . ?person foaf:mbox_sha1sum ?mail }
Otherwise, a complicated query is needed:
SPARQL SELECT ?name ((SELECT (min (?person3)) WHERE { ?person3 rdf:type foaf:Person . ?person3 foaf:name ?name . ?person3 foaf:mbox_sha1sum ?mail } )) as ?person ?mail WHERE { { SELECT distinct ?name WHERE { ?person1 rdf:type foaf:Person . ?person1 foaf:name ?name . ?person1 foaf:mbox_sha1sum ?mail1 } } { SELECT ?name (min(?mail2)) as ?mail WHERE { ?person2 rdf:type foaf:Person . ?person2 foaf:name ?name . ?person2 foaf:mbox_sha1sum ?mail2 } } }
The following example demonstrate how to query dbpedia. Suppose there is local onotlogy, which has a datatype property hasLocation with a string containing city names. The query below finds which of those cities are in dbpedia:
SPARQL PREFIX dbpprop: <http://dbpedia.org/property/> PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX vocab:<http://myexample.com/localOntology.rdf> PREFIX dbpedia: <http://dbpedia.org/> PREFIX dbpres: <http://dbpedia.org/resource/> SELECT ?city WHERE { ?sub :location ?city . FILTER(bif:exists(( ASK { ?subdb a dbo:City . ?subdb dbpprop:officialName ?city }))) }
## Example "Find which town or city in ## the UK has the largest proportion of students. PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> PREFIX dbpedia-owl-uni: <http://dbpedia.org/ontology/University/> PREFIX dbpedia-owl-inst: <http://dbpedia.org/ontology/EducationalInstitution/> SELECT ?town COUNT(?uni) ?pgrad ?ugrad MAX(?population) ( ((?pgrad+?ugrad)/ MAX(?population))*100 ) AS ?percentage WHERE { ?uni dbpedia-owl-inst:country dbpedia:United_Kingdom ; dbpedia-owl-uni:postgrad ?pgrad ; dbpedia-owl-uni:undergrad ?ugrad ; dbpedia-owl-inst:city ?town . OPTIONAL { ?town dbpedia-owl:populationTotal ?population . FILTER (?population > 0 ) } } GROUP BY ?town ?pgrad ?ugrad HAVING ( ( ( (?pgrad+?ugrad)/ MAX(?population) )*100 ) > 0 ) ORDER BY DESC 6
The following example demonstrate how to aggregate Distance Values Over Years:
First we insert some data in a graph with name for ex. <urn:dates:distances>:
SQL> SPARQL INSERT INTO GRAPH <urn:dates:distances> { <:a1> <http://purl.org/dc/elements/1.1/date> <2010-12-23T00:00:00> . <:a1> <http://linkedgeodata.org/vocabulary#distance> <0.955218675> . <:a2> <http://purl.org/dc/elements/1.1/date> <2010-12-24T00:00:00> . <:a2> <http://linkedgeodata.org/vocabulary#distance> <0.798155989> . <:a3> <http://purl.org/dc/elements/1.1/date> <2010-12-25T00:00:00> . <:a3> <http://linkedgeodata.org/vocabulary#distance> <0.064686628> . <:a4> <http://purl.org/dc/elements/1.1/date> <2010-12-26T00:00:00> . <:a4> <http://linkedgeodata.org/vocabulary#distance> <0.279800332> . <:a5> <http://purl.org/dc/elements/1.1/date> <2010-12-27T00:00:00> . <:a5> <http://linkedgeodata.org/vocabulary#distance> <0.651255995> . <:a6> <http://purl.org/dc/elements/1.1/date> <2010-12-28T00:00:00> . <:a6> <http://linkedgeodata.org/vocabulary#distance> <0.094410557> . <:a7> <http://purl.org/dc/elements/1.1/date> <2010-12-29T00:00:00> . <:a7> <http://linkedgeodata.org/vocabulary#distance> <0.43461913> . <:a8> <http://purl.org/dc/elements/1.1/date> <2010-12-30T00:00:00> . <:a8> <http://linkedgeodata.org/vocabulary#distance> <0.264862918> . <:a9> <http://purl.org/dc/elements/1.1/date> <2010-12-31T00:00:00> . <:a9> <http://linkedgeodata.org/vocabulary#distance> <0.770588658> . <:a10> <http://purl.org/dc/elements/1.1/date> <2011-01-01T00:00:00> . <:a10> <http://linkedgeodata.org/vocabulary#distance> <0.900997627> . <:a11> <http://purl.org/dc/elements/1.1/date> <2011-01-02T00:00:00> . <:a11> <http://linkedgeodata.org/vocabulary#distance> <0.324972375> . <:a12> <http://purl.org/dc/elements/1.1/date> <2011-01-03T00:00:00> . <:a12> <http://linkedgeodata.org/vocabulary#distance> <0.937221226> . <:a13> <http://purl.org/dc/elements/1.1/date> <2011-01-04T00:00:00> . <:a13> <http://linkedgeodata.org/vocabulary#distance> <0.269511925> . <:a14> <http://purl.org/dc/elements/1.1/date> <2011-01-05T00:00:00> . <:a14> <http://linkedgeodata.org/vocabulary#distance> <0.726014538> . <:a15> <http://purl.org/dc/elements/1.1/date> <2011-01-06T00:00:00> . <:a15> <http://linkedgeodata.org/vocabulary#distance> <0.843581439> . <:a16> <http://purl.org/dc/elements/1.1/date> <2011-01-07T00:00:00> . <:a16> <http://linkedgeodata.org/vocabulary#distance> <0.835685559> . <:a17> <http://purl.org/dc/elements/1.1/date> <2011-01-08T00:00:00> . <:a17> <http://linkedgeodata.org/vocabulary#distance> <0.673213742> . <:a18> <http://purl.org/dc/elements/1.1/date> <2011-01-09T00:00:00> . <:a18> <http://linkedgeodata.org/vocabulary#distance> <0.055026879> . <:a19> <http://purl.org/dc/elements/1.1/date> <2011-01-10T00:00:00> . <:a19> <http://linkedgeodata.org/vocabulary#distance> <0.987475424> . <:a20> <http://purl.org/dc/elements/1.1/date> <2011-01-11T00:00:00> . <:a20> <http://linkedgeodata.org/vocabulary#distance> <0.167315598> . <:a21> <http://purl.org/dc/elements/1.1/date> <2011-01-12T00:00:00> . <:a21> <http://linkedgeodata.org/vocabulary#distance> <0.545317103> . <:a22> <http://purl.org/dc/elements/1.1/date> <2011-01-13T00:00:00> . <:a22> <http://linkedgeodata.org/vocabulary#distance> <0.75137005> . <:a23> <http://purl.org/dc/elements/1.1/date> <2011-01-14T00:00:00> . <:a23> <http://linkedgeodata.org/vocabulary#distance> <0.123649985> . <:a24> <http://purl.org/dc/elements/1.1/date> <2011-01-15T00:00:00> . <:a24> <http://linkedgeodata.org/vocabulary#distance> <0.750214251> . }; callret-0 VARCHAR _______________________________________________________________________________ Insert into <urn:dates:distances>, 48 (or less) triples -- done 1 Rows. -- 94 msec.
Then we execute the following query:
SQL> SPARQL PREFIX dst: <http://linkedgeodata.org/vocabulary#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT (bif:year( bif:stringdate(?sdate)) AS ?syear) (bif:sum( bif:number(?dist)) AS ?distance) FROM <urn:dates:distances> WHERE { ?row dc:date ?sdate . ?row dst:distance ?dist } GROUP BY (bif:year(bif:stringdate(?sdate))) ORDER BY ASC(bif:year(bif:stringdate(?sdate))); syear distance VARCHAR VARCHAR ________________________________________________ 2010 4.313598882 2011 8.891567721 2 Rows. -- 31 msec.
Inferencing is added to a SPARQL query only for those variables whose value is actually used. Thus,
SELECT COUNT (*) FROM <http://mygraph.com> WHERE {?s ?p ?o}
will not return inferred values since s, p, and o are not actually used. In contrast,
SPARQL SELECT COUNT (?s) COUNT (?p) COUNT (?o) FROM <http://mygraph.com> WHERE {?s ?p ?o}
will also return all the inferred triples.
Note: This difference in behaviour may lead to confusion and will, therefore, likely be altered in the future.
When expressions occur in result sets, many variables are often introduced only for the purpose of passing a value from a triple pattern to the result expression. This is inconvenient because many triple patterns are trivial. The presence of large numbers of variable names masks "interesting" variables that are used in more than once in pattern and which establish logical relationships between different parts of the query. As a solution we introduce pointer operators.
The +> (pointer) operator allows referring to a property value without naming it as a variable and explicitly writing a triple pattern. We can shorten the example above to:
SPARQL prefix tpch: <http://www.openlinksw.com/schemas/tpch#> SELECT ?l+>tpch:linestatus count(*) sum(?l+>tpch:lineextendedprice) FROM <http://localhost.localdomain:8310/tpch> WHERE { ?l a tpch:lineitem }
The ?subject+>propertyname notation is equivalent to having a triple pattern ?subject propertyname ?aux_var binding an auxiliary variable to the mentioned property of the subject, within the group pattern enclosing the reference. For a SELECT, the enclosing group pattern is considered to be the top level pattern of the where clause or, in the event of a union, the top level of each term of the union. Each distinct pointer adds exactly one triple pattern to the enclosing group pattern. Multiple uses of +> with the same arguments do not each add a triple pattern. (Having multiple copies of an identical pattern might lead to changes in cardinality if multiple input graphs were being considered. If a lineitem had multiple discounts or extended prices, then we would get the cartesian product of both.)
If a property referenced via +> is absent, the variable on the left side of the operator is not bound in the enclosing group pattern because it should be bound in all triple patterns where it appears as a field, including implicitly added patterns.
The ?subject*>propertyname notation is introduced in order to access optional property values. It adds an OPTIONAL group OPTIONAL { ?subject propertyname ?aux_var }, not a plain triple pattern, so the binding of ?subject is not changed even if the object variable is not bound. If the property is set for all subjects in question then the results of *> and +> are the same. All other things being equal, the +> operator produces better SQL code than *> so use *> only when it is really needed.
Pure SPARQL does not allow binding a value that is not retrieved through a triple pattern. We lift this restriction by allowing expressions in the result set and providing names for result columns. We also allow a SPARQL SELECT statement to appear in another SPARQL statement in any place where a group pattern may appear. The names of the result columns form the names of the variables bound, using values from the returned rows. This resembles derived tables in SQL.
For instance, the following statement finds the prices of the 1000 order lines with the biggest discounts:
SPARQL define sql:signal-void-variables 1 prefix tpch: <http://www.openlinksw.com/schemas/tpch#> SELECT ?line ?discount (?extendedprice * (1 - ?discount)) as ?finalprice FROM <http://localhost.localdomain:8310/tpch> WHERE { ?line a tpch:lineitem ; tpch:lineextendedprice ?extendedprice ; tpch:linediscount ?discount . } ORDER BY DESC (?extendedprice * ?discount) LIMIT 1000
After ensuring that this query works correctly, we can use it to answer more complex questions. Imagine that we want to find out how big the customers are who have received the biggest discounts.
SPARQL define sql:signal-void-variables 1 prefix tpch: <http://www.openlinksw.com/schemas/tpch#> SELECT ?cust sum(?extendedprice2 * (1 - ?discount2)) max (?bigdiscount) FROM <http://localhost.localdomain:8310/tpch> WHERE { { SELECT ?line (?extendedprice * ?discount) as ?bigdiscount WHERE { ?line a tpch:lineitem ; tpch:lineextendedprice ?extendedprice ; tpch:linediscount ?discount . } ORDER BY DESC (?extendedprice * ?discount) LIMIT 1000 } ?line tpch:has_order ?order . ?order tpch:has_customer ?cust . ?cust tpch:customer_of ?order2 . ?order2 tpch:order_of ?line2 . ?line2 tpch:lineextendedprice ?extendedprice2 ; tpch:linediscount ?discount2 . } ORDER BY (SUM(?extendedprice2 * (1 - ?discount2)) / MAX (?bigdiscount))
The inner select finds the 1000 biggest (in absolute value) discounts and their order lines. For each line we find orders of it, and the customer. For each customer found, we find all the orders he made and all the lines of each of the orders (variable ?line2).
Note that the inner select does not contain FROM clauses. It is not required because the inner select inherits the access permissions of all the outer queries. It is also important to note that the internal variable bindings of the subquery are not visible in the outer query; only the result set variables are bound. Similarly, variables bound in the outer query are not accessible to the subquery.
Note also the declaration define sql:signal-void-variables 1 that forces the SPARQL compiler to signal errors if some variables cannot be bound due to misspelt names or attempts to make joins across disjoint domains. These diagnostics are especially important when the query is long.
In addition to expressions in filters and result sets, Virtuoso allows the use of expressions in triples of a CONSTRUCT pattern or WHERE pattern - an expression can be used instead of a constant or a variable name for a subject, predicate or object. When used in this context, the expression is surrounded by backquotes.
Example: With a WHERE Clause:
The following example returns all the distinct 'fragment' parts of all subjects in all graphs that have some predicate whose value is equal to 2+2.
SQL>SPARQL SELECT distinct (bif:subseq (?s, bif:strchr (?s, '#'))) WHERE { graph ?g { ?s ?p `2+2` . FILTER (! bif:isnull (bif:strchr (?s, '#') ) ) } }; callret VARCHAR ---------- #four
Inside a WHERE part, every expression in a triple pattern is replaced with new variable and a filter expression is added to the enclosing group. The new filter is an equality of the new variable and the expression. Hence the sample above is identical to:
SPARQL SELECT distinct (bif:subseq (?s, bif:strchr (?s, '#'))) WHERE { graph ?g { ?s ?p ?newvariable . FILTER (! bif:isnull (bif:strchr (?s, '#') ) ) FILTER (?newvariable = (2+2)) . } }
Example: With CONSTRUCT
CONSTRUCT { <http://bio2rdf.org/interpro:IPR000181> <http://bio2rdf.org/ns/bio2rdf#hasLinkCount> `(SELECT (count(?s)) as ?countS WHERE { ?s ?p <http://bio2rdf.org/interpro:IPR000181> })` } WHERE { ?s1 ?p1 ?o1 } limit 1
The result should be:
<http://bio2rdf.org/interpro:IPR000181> <http://bio2rdf.org/ns/bio2rdf#hasLinkCount> "0"^^<http://www.w3.org/2001/XMLSchema#integer> .
Example: Inserting into a graph using an expression
SQL>SPARQL insert into graph <http://MyNewGraph.com/> { <http://bio2rdf.org/interpro:IPR000181> <http://bio2rdf.org/ns/bio2rdf#hasLinkCount> `(SELECT (count(?s)) as ?countS WHERE { ?s ?p <http://bio2rdf.org/interpro:IPR000181> })` } WHERE { ?s1 ?p1 ?o1 } limit 1 ; callret-0 VARCHAR _______________________________________________________________________________ Insert into <http://MyNewGraph.com/>, 1 triples -- done 1 Rows. -- 30 msec.
Previous
SPARQL |
Chapter Contents |
Next
RDF Graphs Security |