[Xesam] Wrapping up for Xesam Search Spec RC3

Mikkel Kamstrup Erlandsen mikkel.kamstrup at gmail.com
Tue Aug 19 23:25:00 PDT 2008


2008/8/19 Michael Albinus <michael.albinus at gmx.de>:
> "Mikkel Kamstrup Erlandsen" <mikkel.kamstrup at gmail.com> writes:
>
>> Hi All,
>>
>> We need to get Xesam RC3 out in good time before the hackfest. I
>> personally hope at somewhere near the 1st of September, which should
>> give a little time for people to update their code before the
>> hackfest.
>>
>> I just flushed my buffers onto: http://xesam.org/main/XesamUpdates. Be
>> sure to read it.
>
> Here are some points I've accumulated last weeks. I should have raised
> them earlier, but now there's the finish for RC3 ...

First of all, thanks a lot for your review. It is most appreciated! :-)

HINT: For anybody looking into this you will find it helpful that you
can go directly to the ontology entries by going to
http://xesam.org/main/XesamOntology#LABEL, fx
http://xesam.org/main/XesamOntology#xesam:Contact.

> Typos in Ontology:

Let me first say that I/we are planning a workshop to review and
document the ontology on the hackfest coming up in September. This is
post RC3 however, so you comments should defitinely be be handled
before that. My general opinion is that all descriptions needs to be a
lot more elaborate than just a few words, like they are now.

> - All descriptions have no space before "(". Looks ugly.

Agreed

> - Descriptions end sometimes with period, sometimes not. Maybe it could
>  be unified?

Agreed

> - xesam:Annotation: "specific annotation classes.."

Check, double dots

> - xesam:ContactGroup, xesam:replyTo, xesam:FreeBusy,
>  xesam:musicBrainzAlbumArtistID, xesam:musicBrainzAlbumID,
>  xesam:musicBrainzArtistID, xesam:musicBrainzFingerprint,
>  xesam:musicBrainzTrackID, xesam:contentCategory, xesam:sourceCategory:
>  Why are the descriptions links?

Short:the names are wiki words. This is a bug in the script we use to
generate the page. I'll look into this.

> - xesam:Folder: "on occasion this rule may *be* violated" (but please
>  don't trust my English)

Check

> - xesam:Media: "data bit depth, configuration" (missing space after
>  comma)

Check

> - xesam:SourceCode: "Source code"

Right

> - xesam:Text: "using other classes"

clsses -> classes. Gotcha

> - xesam:compressionAlgorithm: Incomplete description?

Looks like the description got sni

> Other questions in Ontology:
>
> - xesam:author: Why is it a list of strings? The description says
>  "Primary contributor", which is singular.
>
> - xesam:contributor: If it is a list of strings, the description shall
>  say "Secondary contributors".

You can probably find this inconsistency elsewhere. It relates to the
following: In the formal definition of the ontology (and RDF/XML file)
each field is really defined as a relation. Like

  /home/mikkel/foo.pdf   xesam:author   'Mikkel Kamstrup'

which read aloud would be

  /home/mikkel/foo.pdf has author Mikkel Kamstrup

The field definition then states what cardinality the relation has.
The case for xesam:author the cardinality may be unlimited, meaning
that the relation "has author" may be present an arbitrary number of
times. In my wiki-compiling script I changed this into "List of
strings" because I found that this is what most people would expect,
and indeed also what we return over DBus.

I am not entirely sure how to handle this. Maybe just changing the
description like you hint would be sufficient.

> - xesam:isEncrypted: I don't understand the meaning as list of
>  booleans. How are the respective values (true, false) mapped to the
>  parts?

I believe the value of this field should be a single boolean

> - xesam:paragrapCount: Shouldn't this be "xesam:paragraphCount"?

Check

> - xesam:eventTransparrent: Shouldn't this be "xesam:eventTransparent"?

Check

> - xesam:imdbId: Shouldn't this be "xesam:imdbID", like the other
>  xesam:...ID fields?

Check

> - xesam:taskCompleted, xesam:taskDue, xesam:taskPercentComplete: What
>  do the lists mean, when it is a single task?

I think all those fields where meant to be single valued

> - Sometimes, xesam:summary or xesam:snippet return "highlighted" text
>  (hits enclosed by <b>...</b>, for example). Is it possible to get an
>  indication for this? It influences, how the summary (or snippet) is
>  visualized by the Xesam client.

xesam:summary will contain a pregenerated summary of the text. Either
by extracting it from a metadata field inside the file or by
extracting it from a some chunk of text inside the file. I don't know
if we should set any standard for the contents of this. Plain UTF-8
probably. xesam:snippet is another matter. It is always generated on
the fly, and highlights the matching search terms if the engine knows
how to do that.

> - If possible, I would like to get the line number for a given hit. Is
>  this xesam:lineCount? If yes, the description shall be precised. If
>  not ... are there still chances for a new attribute?

This is currently not supported. We would need to talk with the engine
maintainers to hear if this is possible, but in general I would fear
not.

> - Definitely for post-1.0-release: I miss attributes, describing hits
>  in a bug database, like Debian BTS, Bugzilla, ...

What do we really miss apart from a xesam:Bug content category?

> - What is the equivalent to Google's site clause (like site:example.com)?
>  Is it xesam:originURL? xesam:remoteServer? Maybe it can be documented.

I'm not sure we have anything apart from xesam:url. This however would
be prefixed with http:// in most cases which is probably not what
users want. Maybe we need to add this field.

Evgeny will hopefully elaborate on all of this when he is safely back
from holiday.

-- 
Cheers,
Mikkel


More information about the Xesam mailing list