[Clipart] help for the project (at least next release)

Alan alan at ccnsweb.com
Tue Sep 4 18:36:18 PDT 2007


    Below I've summarized some of our previous conversations: (Maybe
    start putting some of this on a wiki?)

    Four Primary Requirements for Open Clipart

1.  Easy to find and download graphics 


    In the absence of being able to download all of the graphics because
    of the sheer size of the package, perhaps it would be better to
    design an installable executable which contains field searchable
    thumbnails and an update function. I have sent a mockup in pdf.

    Scenario: The user downloads the utility. Once the utility is
    downloaded and installed, it automatically begins updating optimized
    thumbnail graphics (png?). The utility also updates all search
    criteria which is already attached to each graphic. The user
    searches for a particular graphic using the local utility. The user
    views the thumbnail(s) which also contains a link to the full svg
    graphic. If the user likes and wants the full graphic, there is a
    check box beside the thumbnail graphic in the utility. The user
    selects as many graphics as he/she desires, using the checkbox. Once
    all selections are made, the user clicks a download button and the
    full svg graphic is downloaded from the server.

    There is a legend below the checkbox for each thumbnail. This legend
    indicates whether the graphic has already been downloaded and
    whether updates are available for the graphic. There is also the
    option to choose which personal storage folder to add this too
    (drop-down list).

    (The user can create personal storage folders for downloaded
    graphics. These then become available under the thumbnails.)

    These thumbnail graphics have a "Thumbprint" - essentially an
    absolutely unique id which permanently identifies the orginal
    graphic. I would suggest the thumbprint have a time-date-author
    stamp which should make for a completely unique id. The stamp should
    be automatically applied when the graphic is first submitted. Allow
    the time stamp to include seconds or even milliseconds?

    These thumbnails also have the identical search field of the orginal
    svg graphic. Much thought should be put into search fields. Some
    more obvious ones are author, date, content. The search fields can
    be part of a drop down list, where the user can choose an item such
    as author and then to the right enter keywords. There should also be
    a feature to add or remove search fields in order to narrow the list
    even further. )See mockup more - less.)

    Later issues would perhaps include integration of the utility into
    OpenOffice?

2.  Easy for the server to handle all required tasks


    As you can see below, updating thumbnails and creating them can be
    resource intensive. What is the best way to do this?

    If it were possible to decentralize the files as in a peer-sharing
    form of search utility, download problems would be removed for large
    files, but security may become an issue for the user.

3.  Easy to administer


    Everything needs to be setup in such a way that updating the
    database with search criteria and graphics is as easy as possible.

    Is there a backup in place for server content?

4.  Easy to add new graphics


    This seems to already work pretty well.





---------------------------
The more we can work with the materials already to hand, probably the 
easier.  All of the graphics are already available in small bite-size 
png chunks.  I'm not sure how everything is currently set up, but how 
about assigning a descriptive permanent name to each png graphic which 
then becomes a permanent key.  Perhaps it could be done using a numeric 
sequence which is the date and time of submission and/or acceptance into 
the database.  This would practically guarantee a completely unique key 
for each graphic. That permanent sequence then can be attached to all 
kinds of information, including search data (descriptive terms, size, 
colors, author, date, whatever).  That sequence could be attached in a 
local utility to a download function for png for quick local searches, 
but it could also be used to define the svg download.   Your update 
function in the  local utility could be used to update a text based file 
or files which would update the search fields any time you wanted.  
Quick downloads with easy updates to search functionality.  Your png 
graphics for local search would also be quick downloads in the small 
chunks.  The biggest update of course would be the first update where 
the utility and the bulk of the png graphics would be downloaded.  After 
that, it would be just very small chunks of updates for search fields 
and added png graphics.   When a graphic or graphics is chosen for svg 
download, the choices are loaded into a component of the utility which 
knows to look for the svg component attached to the permanent key.

This allows complete flexibility with updating the search fields, easy 
identification of graphics based on a permanent key, quick updates 
locally after initial installation, offloading of search function and 
storage to local system, increase of apparent speed in searches due to 
local storage, redundancy should the www or the server be slowed down, 
and enables the user to use the png graphics should the svg not be 
immediately available.

I'm sure I've missed things, and I'm not a programmer (yet), but would 
it be possible to set this up?

Alan

------------------

You're thinking in terms of "on-demand" PDFs.  If those PDFs could be 
generated every week on Sunday Morning at 1 a.m. when server load is low...

Another question is what package you were using to generate the PDFs. 
Some are faster and/or lower-demand than others.  It might be worth 
exploring a few different methods and performance testing/tuning them to 
determine which one brings in the best combo of speed and CPU load.

Perhaps the best bet is not to create the PDFs from SVG, but to use a 
two stage process where all the SVGs are converted into JPEG using Batik 
or ImageMagick or commandline Inkscape, then the PDFs are built using 
the JPEG images.  That might be faster because you can use the fastest 
method for all the SVG to JPEG conversions, then the fastest method to 
generate PDFs.

Or... offload portions of the processing to the user's machine.

Once you built some XML indices and bitmaps of the images, you could 
build a collection browser frontend that offloaded a large chunk of the 
processing to a client-side interface in Flash.

For example, if you wanted a tag browser, you create an XML file...

	<item>
	  <itemtitle>Witch on Broom</itemtitle>
           <author>Zeimusu</author>
	  <date>12-12-2006</date>
	  <svgurl>http://...</svgurl>
	  <bitmapurl>http://...</bitmapurl>
	  <thumburl>http://...</thumburl>
	  <tags>witch,broom,Halloween,clip_art</tags>
	</item>

For the current collection of 3000 or so tagged images, the XML file 
might be 1.5-2 megs.  Use zLib compression (Flash 9 / ActionScript 3 has 
zLib compression support, IIRC) to compress the XML file.

The frontend can unzip the file, then parse the XML into Tag and Author 
arrays which can be used to generate thumbnail indices for all the tags 
and authors, a higher-res (say 400x400) preview of individual images, 
and offer the user an SVG download link.  Generate the XML file and 
bitmaps nightly during a slower period.

Make the XML file structure available and let people play with 
developing client side browsers based on the XML.  Heck, you might make 
the browser a Flash component and let people have fun doing mash-ups 
that incorporate the component.

OTOH, there's also the option of making sure Google has an accurate 
sitemap of all the different pages you want to offer, then using an 
embedded Google Search (using the Google AJAX search API or AdSense for 
Search) to offload some of the search stress to Google.


- Greg


Alan schrieb:


> "would be not too difficult"
>
> One of the really great things about open source software is that it 
> truly benefits from ease of use and simplicity.   Highly effective 
> ideas made as effective/transparent/simple as possible is true 
> genius.  Open source software caters to this model because there is 
> nothing to hide and nothing to protect.
>
> Be the best you can be, because open source encourages exactly this.
>
>
Alan, I already tried to build PDF-catalogues from the last public 
provided package 0.18. It wasn't a big problem to write the code: the 
problem was the realy big amount of cliparts which were already in OCAL. 
It took too much time and CPU performance for generating PDFs on the 
fly. And OCAL still grows and grows. So, I think the only way for 
producing PDFs from OCAL would be an offline way. For doing that, I 
would need information about the structure of the database, how files 
are stored and referenced and so on.

Best,

Tom


Jon Phillips wrote:

> > On Thu, 2007-05-10 at 11:02 -0600, Alan wrote:
> >   
>   
>> >> I've been thinking.
>> >>
>> >> Your server is going to get hammered everytime people do searches on it
>> >> for particular graphics.   I like the idea of setting up a thumbnail
>> >> database and a search function to find particular photos, or you could
>> >> organize by type as well.
>> >>
>> >> Would it be possible to create a utility which downloads to a users
>> >> computer the thumbnail database and other search criteria?  This would
>> >> allow searches to take place locally.  When a user has decided what they
>> >> want, the utility will connect to the server and download what they
>> >> want.  The local utility could be set up to link to a complete download
>> >> or a multiple set of chosen graphics or a single picture at a time.
>> >>
>> >> This will reduce the workload on the server and allow users to browse
>> >> through the thumbnail photo collection at their leisure on their own
>> >> computer.
>> >>     
>>     
> >
> > This is a great idea. Would you like to help realize this?
> >
> > Bryce Harrington (on the list) has been working on Inkscape + Open Clip
> > Art Library integration, and this is definitely something we have talked
> > about.
> >
> > Jon!


John Olsen wrote:
>
> I agree with Jon that online searching and use of the archive is  
> probably more likely in the future.  Pointing at resources on the web  
> rather than downloading them all seems like the trend.  Maybe it is  
> the visualist in me, but for a library of any art I think a browser  
> with thumbnails in the most important feature.  It is currently quite  
> tedious to drill down to each piece of art then back up again.  Even  
> a next and previous button might help here.  But clearly being able  
> to see the collection displayed as a gallery would be a huge  
> improvement.
>
> And even though the classic method for organizing files is  folder  
> and sub-folder, it seems the tagging system is far more dynamic and  
> let's everyone custom tailor their results.
>
> BTW, I have the cleanup under 100 items now, but some of them are not  
> fixable by me.  I think they were made with the beta of Inkscape or  
> something as I can find no way to clean them up without breaking them.
>
>   



More information about the clipart mailing list