From time to time I come back to playing with Wikidata. I will start entering some data and, invariably, tinker with the SPARQL service it provides. Every single time I need to relearn this, because I do not use it often enough. So this time I figured I would keep some notes I could more easily reference next time.

Wiki What? SPARQL Who?

Wikidata is a big open database of RDF triples (also called semantic triples). In essence, RDF triples come down to statements like: subject predicate object. An example would be: Ward is a human.

SPARQL (something something Query Language) is a method of processing that information. Think SQL, but for RDF triple databases. Wikidata provides a service for you to enter SPARQL and query Wikidata.

What Makes a Query

The basic search has a SELECT and a WHERE. After the SELECT you list the variables (starting with a ?) that you want to be returned in the results. In the WHERE you can throw a bunch of statements about RDF triples. You can also group and order and such after the WHERE. I will go into those later.

SELECT ?item
WHERE {
  # Various statements that limit the number of matches
  # For example property P31 "instance of" Q146 "house cat"
  ?item wdt:P31 wd:Q146.
}

Prefixes

These will pop up a bit everywhere, it tells the SPARQL service how to interpret the Qnnn and Pnnn identifiers that follows after the :. Do not worry if that does not make sense, it will once you see some code.

  • wd: Wikidata entity
  • wdt: Wikidata property
  • p: Access the statement itself (see Qualifiers below)
  • ps: Access the main data from a statement (see Qualifiers below)
  • pq: Access a qualifier (see Qualifiers below)

Statements

Regular statement inside a WHERE. You can turn each part of “subject property object” into a variable. Note the Qnnn are entities in Wikidata. The nnn is its ID in the database. Similarly, the Pnnn are properties in the Wikidata database.

# ?cat <property:instance of> <entity:house cat>
?cat wdt:P31 wd:Q146.

Use ; for multiple statements for one subject. You can continue adding statements ending in ; and adding another statement. All will use the same subject.

# ?cat <instance of> <house cat>
?cat wdt:P31 wd:Q146;
     # <position held> <Chief Mouser to the Cabinet Office>
     wdt:P39 wd:Q198641.

Property Combinations

Use the forward slash / to chain properties.

# <subjects> <position held>/<part of> <objects>
?cat wdt:P39/wdt:P361 ?organisation.

is short for

# <subjects> <position held> <objects>
?cat wdt:P39 ?position.
# <previousobjects now subjects> <part of> <objects>
?position wdt:P361 ?organisation.

[ ... ] can achieve a similar effect. The internet tells me it is more flexible than /, but potentially slower. It creates a “blank node” that you can use in place of the object. Presumably the subject and property too, but I cannot get that to work right now. The following seem to behave the same, but I cannot make guarantees.

?entity wdt:P39 ?position.
?position wdt:P263 wd:Q169101.
?entity wdt:P39/wdt:P263 wd:Q169101.
# entities <position held> [ <official residence> <10 Downing Street> ]
?entity wdt:P39 [ wdt:P263 wd:Q169101 ].

^ reverses the “subject property object” to “object property subject”. Note it only reverses the property it is in front of, not any others you might chain with /.

# <house cat> <reversed instance of> <the actual cats>
wd:Q146 ^wdt:P31 ?cat.

Multipliers * and + to have a property match zero or more and once or more, respectively. In this example, it lets us get all the subclasses of a house cat. The * to + difference is that, in the example, the * will also return the house cat entity itself, the + omits it.

# <cat types> <subclass of>* <house cat>
?cattypes wdt:P279* wd:Q146.

Note that you can combine all the above into one funky matcher. Try to balance brevity and readability.

Qualifiers

You can use p: to get a reference to the entire statement instead of to the object. From such a statement, you can then get the object by using ps:. So, for example, rather than getting all entities holding the position of Chief Mouser, we can get the statement that says “entity holds position something”, then specify that the position held is Chief Mouser. Note we use the same property (P39) for each.

# Following are same as
# # entities <position held> <Chief Mouser to the Cabinet Office>
# ?cat wdt:P39 wd:Q198641.
?cat p:P39 ?positionheldstatement.
?positionheldstatement ps:P39 wd:Q198641.

“What is the point?”, I hear you ask. A statement can come with qualifiers. There have been several Chief Mousers and for each, the position held statement comes with a qualifier such as its start time, end time, series ordinal. Now that we have grabbed hold of a statement, we can reach those qualifiers.

?cat p:P39 ?positionheldstatement.
?positionheldstatement ps:P39 wd:Q198641.
# that statement <start time> capture in variable
?positionheldstatement pq:P580 ?started.
# that statement <end time> capture in variable
?positionheldstatement pq:P582 ?ended.

Note that the current Chief Mouser (at the time of writing: Larry) does not have an end date, so he would not actually appear in the results here. To make him reappear, you could wrap that final statement in an OPTIONAL { }.

More Keywords

Make a statement OPTIONAL. Without this, items that do not match a certain statement might get filtered out. Every statement not wrapped in an OPTIONAL must have a match.

OPTIONAL { statement }

UNION different statements to combine results

{statement}
UNION
{statement}

DISTINCT ensures distinct results.

SELECT DISTINCT ...
WHERE {
}

COUNT to get the number of matches. More efficient than returning all matches and counting them on your end. Note that you have to give a name to the result. Also note that you have to wrap the entire count-and-rename into parentheses or you will get a syntax error.

SELECT (COUNT(?cat) AS ?numberofcats)
WHERE {
  # cat <instance of> <house cat>
  ?cat wdt:P31 wd:Q146.
}

GROUP BY to combine results by means of variable values. Note that everything that appears in the SELECT part has to be a variable that is grouped by or something that gets aggregated by means of an expression such as COUNT.

SELECT ?position (COUNT(?cat) AS ?numberofcats)
WHERE {
  ?cat wdt:P31 wd:Q146.
  ?cat wdt:P39 ?position.
}
GROUP BY ?position

That variable in the group by requirement also is the case for labels!

SELECT ?position ?positionLabel (COUNT(?cat) AS ?numberofcats)
WHERE {
  ?cat wdt:P31 wd:Q146.
  ?cat wdt:P39 ?position.

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". }
}
GROUP BY ?position ?positionLabel # MUST add it here too

Combine GROUP BY with HAVING if you want to put limits on the groups you get. You can only using HAVING in combo with a grouping. For example if you only want positions held by at least two cats, do this. Note parentheses around it.

SELECT ?position (COUNT(?cat) AS ?numberofcats)
WHERE {
  ?cat wdt:P31 wd:Q146.
  ?cat wdt:P39 ?position.
}
GROUP BY ?position
HAVING (?numberofcats > 1)

Use ORDER BY to have your results show in a particular way. To decide ascending or descending, wrap it in ASC or DESC. Default is ascending.

SELECT ?cat ?dob
WHERE {
  # cat <instance of> <house cat>
  ?cat wdt:P31 wd:Q146.
  # subject <date of birth> object
  ?cat wdt:P569 ?dob.
}
ORDER BY DESC(?dob)

Sometimes you get too many results. Throw in a LIMIT at the end to decide the maximum number of results.

SELECT ?cat ?dob
WHERE {
  # cat <instance of> <house cat>
  ?cat wdt:P31 wd:Q146.
  # subject <date of birth> object
  ?cat wdt:P569 ?dob.
}
ORDER BY DESC(?dob)
LIMIT 20

Other keywords that might come in handy: FILTER(), YEAR().

Label

Adding a label the Wikidata way. The variable name is implicit, adds Label to the existing variable name. You can also use ?catDescription to get the description text.

SELECT ?cat ?catLabel
WHERE
{
  ?cat wdt:P31 wd:Q146.
  # Label in your language, if not, then default for all languages, then en language
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". }
}

Adding a label slightly more generically (and less automatic), lets you pick the variable name. Note that this only selects a single language label. Not FILTERing the language label will return the row several times for every language label match.

SELECT ?cat ?name
WHERE
{
  ?cat wdt:P31 wd:Q146.
  ?cat rdfs:label ?name.
  FILTER (LANG(?name) = "en")
}

QLever

I was going to go into QLever as well, which is a, supposedly, faster query engine and instance that besides Wikidata also offers OpenStreetMap, IMDb, and a bunch of other data as triples. I think I will keep that for a possible future post. QLever’s Wikidata service should behave more or less the same as described here, though it might require explicitly defining the Wikidata prefixes beforehand. As far as I can tell, it automatically adds them for you though. Also note that Wikidata’s service usually updates within a few minutes. In QLever it seems to take maybe a week.

Further Reading