[Clipart] Malware in clipart

Tue Mar 15 22:04:44 PST 2005

Jon Phillips wrote:

>>>It seems to me that we will not have the resources to hand-examine
>>>every submission to ensure it is innocuous, so (barring an
>>>earthshattering breakthrough in AI research) if we take any
>>>precautions at all it will have to be stripping out all scripts of any
>>>kind, malware or not.  (Which, on the whole, doesn't sound like a
>>>terribly bad idea to me...  feel free to jump in and explain why we
>>>shouldn't do that, if you can think of any solid reasons.)
>>
>>Another approach could be to simply slap a keyword on it, e.g. 'script'
>>or 'executable', and exclude all such images from the releases.  Then,
>>if people are so motivated, they can individually review/approve them.
>>
>>There's four reasons for this suggestion:
>>
>>First, presumably if an SVG includes a non-malware script, it's probably
>>there for a reason, such as for animation.  In this case, removing the
>>script may invalidate the image, in which case a "stripped" version
>>could be worthless anyway.

Yes and no.  Currently, barely anything supports scripts, so this is highly 
speculative.  But if it's like the Web, there are enough non-script and 
script-disabled browsers that a reasonable fallback will often be provided.

>>Second, if someone goes to the trouble of writing a script AND
>>submitting it to OCAL, and then the script gets stripped out, my guess
>>is that they're going to come and complain.  Having a procedure that
>>allows you to review/approve individual images on a case-by-case basis
>>will enable the project to handle these situations professionally, and
>>not as "special exceptions".

If we make a "validation" step part of the process (upload it, look at a 
rendered PNG, look at the final version, approve) then it will catch both 
people whose scripts are mangled by this process and people subitting 
non-inkscape scripts that look wrong when rendered by our de-facto standard 
renderer.

People who intentionally add scripts expect them to stay.  However, programs 
may start adding "helpful" script elements which don't do anything useful but 
trigger our detection; in this case automatic removal is a good idea.

>>Third, this approach remains consistent with the process for handling
>>other types of abusive images, so hopefully would reduce the variety of
>>scripts needed to be written/maintained.

A "validate and normalize" script seems like a good idea, enforcing the 
presence of metadata (and maybe a formalized PD declaration) and various 
security invariants; we could also give people a chance to view their image 
rendered under inkscape (and possibly some other renderers) to catch partially 
broken submissions.

>>Finally, if in fact a given SVG image also has a piece of malware
>>scripted into it, how likely are we going to want to keep the image
>>portion?  Stripping the viral part of an email virus wouldn't turn it
>>into a useful email.  ;-)  I wouldn't think stripping malware from SVG
>>would result in a worthwhile SVG image, either.

If it actually has malware, probably not - although if that malware is an 
actual virus, it may have been added after the fact to a perfectly normal SVG file.

>>Again, I think instead of stripping, just flag the images and filter
>>them out for the releases.  This should be a simpler thing to do, and
>>requires only small tweaks to the existing tools, plus a simple script
>>to scan the SVG for keywords ("<SCRIPT...", etc.) and if it comes up
>>positive, move those files aside, and/or add a keyword to them.  Then,
>>in the release scripts, add another filter like the one for the flags,
>>to exclude images with scripts in them.
> 
> Yes, I agree with this approach. I'm adding to the roadmap. Does anyone
> want to conquer this task? Andrew, would you like to conquer this one?
> We should figure out how to add this to validation of SVG files once
> input into the site and then again when doing a release. We should
> discuss this further? Any specific suggestions from anyone on
> implementation.
> 
> Please check roadmap: http://www.openclipart.org/cgi-bin/wiki.pl?Roadmap
> 
> We are in need of some good soldiers to help with these tasks.
> 
> Jon

I don't know about "conquer", but I did improve my script; I attached it to 
some other list email, but see also 
http://en.wikipedia.org/wiki/User:Aarchiba/SVG_sanitizer (where I will update 
it with various versions).

It takes an SVG file on standard input; it sends a sanitized version to 
standard output, and returns a nonzero exit status if anything dubious was found.

It DOESN'T do anything sensible if the document is not pure SVG; if you've used 
namespaces or included some other kind of XML, all bets are off (see the 
source).  So it's not ready for prime time.  I need to read more about XML 
before I can do that.

Andrew