[Clipart] question about releases

Jon Phillips jon at fabricatorz.com
Sun Apr 7 21:32:02 PDT 2013


This is great backend work to pair with the front-end push to get more
clipart and activity we started.

More below...

On Mon, Apr 8, 2013 at 11:17 AM, Wolfgang Spraul
<wolfgang at fabricatorz.com> wrote:
> Francis,
>
> On Sun, Mar 24, 2013 at 09:11:32PM +0800, Francis Bond wrote:
>> Actually, I am using the OCAL images plus tags as data for an
>> assignment for my students.  If I could get the full meta-data soon (I
>> actually gave them the assignment last week, but they have four weeks
>> in all) they could match with it which would make the task much more
>> rewarding.
>
> I just posted another adhoc release where all 41k+ graphics have their
> metadata updated to reflect the tags we have in the database.
>
> http://openclipart.org/adhoc_release_all_svgs_2013-04-07.tar.bz2
> 1,337,143,578 bytes, md5sum c19df9a4...
>
> *) converted about 1500-2000 graphics from ISO-8859-1 to UTF-8. So
> going forward I think we should say that all files in the library
> are UTF-8
>
> *) resolved DTD-entities in about 1500-2000 files. Those are cases
> where the namespace is, for example "&ns_svg;", referencing an
> ENTITY. If the namespace was used, I could not open the file in
> Inkscape. Ran xmllint --noent to resolve references.
>
> *) manually fixed XML issues in about 50-100 files, deleted some
> others that were invalid or partial uploads.
>
> *) the old (now overwritten) metadata was preserved in files with
> .upload-metadata extension, for example
> http://openclipart.org/people/rejon/rejon_Supergirl.svg.upload-metadata
> I also created a tarball with all old metadata in it, just in case.
> If I hear nothing much back about the metadata, I will delete the
> .upload-metadata files in a week or so.
>
> Going forward the syncing between database tags and .svg files is not
> yet automated, but we can run the update script every month or so.
> I think it's definitely worth our time to work a bit on improving
> the tags now, if more librarians feel motivated - please do so.
>
> http://openclipart.org/tags/clipart_issue
> http://openclipart.org/listnotags
> http://openclipart.org/listnodescription
>
> - case in filenames
> I found 74 (=37*2) files where the filename differs only in case.
> See
> http://openclipart.org/filenames_with_case_diff.txt
>
> These files are already triggering bugs in our mysql processing,
> and they would most likely cause trouble on some Windows systems
> as well. Maybe going forward we should adopt a policy to not
> allow multiple uploads by the same user where the only difference
> in filename is case? I will go through these 74 files to see whether
> they are duplicates, then either pick a winner or rename the second
> one to _2 or so.

Sound good

>
> - case in tags
> There are about 33,000 different tags in use. What do people think
> about lower-casing all A-Z characters in the tags? That way "Car"
> and "car" would become the same tag.

This is a good idea...actually normaling clipart is a good thing to do.

> - xml
> Do we have some xml experts here who have preferences wrt DTD,
> inkscape/sodipopdi/adobe namespaces, etc?

Whatever is most compatible.

Jon



--
Jon Phillips 王✳ http://fabricatorz.com ✳ skype: kidproto ✳ irc: rejon
+1.415.830.3884 (global) ✳ +86-187-1003-9974 (beijing)


More information about the clipart mailing list