[Clipart] bulk upload

John Olsen johnny_automatic at mac.com
Fri Nov 23 18:43:40 PST 2007


The .txt files were generated by the old set-up I think.  That's why  
I included them in my sort.  I think there is pretty much a .txt file  
for every svg and they were generated on upload with the old engine.   
This unfortunately is not true with the incoming archive.  But that  
is a much smaller collection and quiet frankly many of the files are  
garbage.  it remains totally unsorted or cleaned up.

John

On Nov 23, 2007, at 6:19 PM, Roan Horning wrote:

> Hi John,
>
> I just downloaded the "Aaron Johnson.zip" file from your .Mac page.  
> It looks like the the svg files have embedded in them the  
> information that is in the <image name>.txt files. Did you notice  
> this in your clean up, or was this part of your clean up process?  
> If the information is already in the files it would make it  
> slightly easier to automate the process.
>
> We could minimize the pain by splitting the uploads into groups.  
> Active artists with only a few images in the old archive. These  
> people we could encourage do any touch up work needed and reload  
> the files themselves.
>
> Active artists with bunches of work to upload. We batch upload the  
> files and have them added to their account.
>
> Inactive artists, we could create accounts for each of them with  
> some sort of designator in the account name that they are unclaimed/ 
> inactive, and a notice in their profiles. The site librarians could  
> have access to these accounts, so they can clean up the files as  
> needed. If someone comes along to claim the work, we can either  
> give them control of the account, or have them create a new account  
> and upload the files to it.

Currently librarians can edit any file so this is not a problem. Just  
the time it takes for us to do it.  Usually we try the "teach a man  
to fish" approach by explaining what needs fixing to the submitter,  
but when that doesn't work we fix it ourselves.

Your suggested break up of the workload sounds good.  I was thinking  
similarly.

>
> You've done a great job of splitting the files up by author. We  
> need a script that takes the list of inactive artists and loops  
> through it, calling the code that creates a new account. Once we  
> have the new accounts created and their associated directories, we  
> can unzip the archives into the proper accounts. Then we need  
> another script that can read the file information and put it into  
> the right tables in the database. I'm happy to work on this. It  
> would be great to have all the older work available to the new  
> site, and a shame to have your efforts languish in limbo.

Thanks, I tried through my manual way to collect files in one folder  
by the same author - we have several cases where the same author has  
used different names to upload.  Where this is obvious like Francesco  
Rollandin I combined them into one folder (so that might be a case  
where a script would divide them back into what it sees as multiple  
authors based on the author field) , but there certainly may be  
others that only the author would know for sure.  They can either  
live with the multiple names or delete and replace files accordingly.

The tags from the old site will be the minimal ones the directory  
tree had there, but that is a minor issue that can be hand fixed  
where it is a problem.   Like a number of the files have unsorted as  
their only keyword.  But they could be sorted by that keyword and  
fixed that way.

>
> I think I have all the access that is needed, and can test  
> everything on my own cchost setup, and the test server, before  
> doing anything on the production site. We need a consensus on the  
> best way to reflect that the artists account has been automatically  
> added to the site. Does anyone see any problems with this plan?
>
> --Roan
>
>

John



More information about the clipart mailing list