[Xesam] Details of tag handling

Wed Aug 27 00:44:27 PDT 2008

2008/8/27 Jamie McCracken <jamie.mccrack at googlemail.com>:
> On Tue, 2008-08-26 at 21:49 +0200, Mikkel Kamstrup Erlandsen wrote:
>> 2008/7/2 Sebastian Trüg <strueg at mandriva.com>:
>> > On Wednesday 02 July 2008 12:16:57 Mikkel Kamstrup Erlandsen wrote:
>> >> 2008/7/2 Sebastian Trüg <strueg at mandriva.com>:
>> >> > While this is easy to query it is not very clean semantically speaking
>> >> > since you link tag objects and tagged resources via string literals. In
>> >> > Nepomuk we use tag objects and relate tagged resources to the URI of this
>> >> > object.
>> >> >
>> >> > - Querying objects on their tags is also simple: just get all resources
>> >> > that are related to the tag via the tagging property.
>> >> > - Getting a list of all tags is as easy as before: list all resources of
>> >> > type xesam:Tag (or nao:Tag in Nepomuk)
>> >> > - Querying all files without a tag is also simple (in sparql we use a
>> >> > filter) - listing all files with their associated tag's names is also not
>> >> > a big deal (a sparql query can easily be extended or one fetches tags and
>> >> > their labels for each file separately)
>> >> > - renaming of tags is much simpler than your method: only change one!
>> >> >  property: the tag's label (in your version all tagged files have to be
>> >> >  updated.
>> >> > - deleting a tag is as simple as deleting the tag resource and all
>> >> > triples referencing it.
>> >> > - having tags without files is trivial: tag resources can live on their
>> >> > own anyway.
>> >> >
>> >> > Hope this helps a bit. Keep in mind that with tag resources we leave the
>> >> > world of two-dimensional metadata which can be handled by databases like
>> >> > lucene.
>> >>
>> >> That is the problem. However for tags specifically it may not be a
>> >> problem. If we can assume that users don't carry around tonnes of tags
>> >> at least. The application can first fetch a list of all Tag objects
>> >> and create a tag_name<->uri map.Then search for items with the
>> >> particular uri when the user requests some tagged items. As said this
>> >> does not scale very well.
>> >>
>> >> This could be remedied by having a convention on how to construct tag
>> >> uris. For example tag://<tag_name> or something more elaborate.
>> >>
>> >> So; bottom line - I am not as such opposed to using proper uri
>> >> relations for tags. We just need to agree on hot to do it.
>> >
>> > The way I see it, the problem is: do we introduce triples into Xesam at this
>> > level or not, right?
>> > Because with triples it is easy (as already done in Nepomuk). But without
>> > plain strings are way simpler. The question is: do we want some hybrid
>> > solution which may be an overhead for both worlds.
>>
>> (This thread is item 6 on http://xesam.org/main/XesamUpdates)
>>
>> The more I think about this, the more I think the answer is "yes". I
>> do not object to the term "hybrid", but I actually still think the
>> approach is fairly clean (considering that Xesam utilizes a field
>> based data model).
>>
>> So to rehash:
>>
>>  * xesam:userKeywords stores a list of _opaque_ tag:// uris
>>  * Applications wanting to query tags first have to resolve tag-name
>> -> tag-uri however it prefer (on-the-fly or precomputed map)
>>  * This gives incredible flexibility compared to a flat list of tag
>> names stuffed in xesam:userKeywords. Flexibility both for the server
>> and client
>>
>> That's my opinion anyway. Do chime in.
>
> I would keep it simple as list of strings
>
> nepomuk could then associate a tag name with a tag object and get
> additional properties. I dont see why tag://mytag should not be stored
> as mytag? The problem for us is we would have to special case tags for
> indexing purposes and remove the uri. So its more pain and no gain for
> all the non-nepomuk implementations

Don't you do this for file URLs already?

> tracker would use tag name to look into an emblems folder for things
> like icons so it would not store anything other than tag name

With proper linking to Tag object, the Tag objects could have tag
color, icons and what not stored as fields. Ie remove the requirement
to look up and parse additional metadata on disk.

> A lot of these *extras* are making it considerably more difficult and
> more time consuming to implement basic xesam stuff. The barrier of entry
> to xesam 1.0 should be kept as low as possible IMO

I am not 100% sure I underatnd your intentions... Is it to use flat
keywords *only* or to use the flat keywords as identifiers that can be
used to look up xesam:Tag objects?

I hope it is the latter (like I initially proposed in the thread),
because without xesam:Tag objects the tag use in Xesam is close to
worthless.

The benefit of using opaque URIs is that we get easy and cheap tag
renaming. Just change xesam:title on the right xesam:tag object. If we
use the keyword as an id all objects tagged with the keyword needs to
be touched as well... As Sebastian also mentioned.

Cheers,
Mikkel