[poppler] There is a flaw with poppler that needs to be fixed. Deleted annotations are not actually deleted. I require assistance in fixing this.

Leonard Rosenthol lrosenth at adobe.com
Fri Sep 23 18:56:09 UTC 2022


Zeke – the issue here isn’t specific to deletion of annotations but is related to the PDF file format and it’s support for “incremental updates”.

When saving changes to a PDF, they can either be saved by simply appending them as part of an increment update section (which includes not only new or changed objects, but a list of deleted objects).  This is the most common way to save things because it is faster.  You will find that 99% of all PDF processing tools do this by default.

Alternatively, software could do a “full save” or a “Save As”, where objects no longer in use are “garbage collected”.  Poppler does not offer this option.

Leonard

From: poppler <poppler-bounces at lists.freedesktop.org> on behalf of Zeke Williams <lakeleaf8 at gmail.com>
Date: Friday, September 23, 2022 at 9:20 AM
To: poppler at lists.freedesktop.org <poppler at lists.freedesktop.org>
Subject: [poppler] There is a flaw with poppler that needs to be fixed. Deleted annotations are not actually deleted. I require assistance in fixing this.
EXTERNAL: Use caution when clicking on links or opening attachments.


I require assistance as I am not a very proficient C++ programmer with
this issue with poppler. What happens with poppler is that the portion
of the PDF document that shows the annotation is deleted when you
delete an annotation in such as okular or evince, but the actual
contents is in a separate part of the document and that doesn't get
deleted. Meaning in other words, it's still there. That is a privacy
violation that should be fixed. I believe this is the part of poppler
that removes the annotation:

bool Annots::removeAnnot(Annot *annot)
{
    auto idx = std::find(annots.begin(), annots.end(), annot);

    if (idx == annots.end()) {
        return false;
    } else {
        annot->decRefCnt();
        annots.erase(idx);
        return true;
    }
}

And from another PDF reader (PDF4QT) here is how it removes them:

void PDFDocumentBuilder::removeAnnotation(PDFObjectReference page,
PDFObjectReference annotation)
{
    PDFDocumentDataLoaderDecorator loader(&m_storage);

    if (const PDFDictionary* pageDictionary =
m_storage.getDictionaryFromObject(m_storage.getObjectByReference(page)))
    {
        std::vector<PDFObjectReference> annots =
loader.readReferenceArrayFromDictionary(pageDictionary, "Annots");
        annots.erase(std::remove(annots.begin(), annots.end(),
annotation), annots.end());

        PDFObjectFactory factory;
        factory.beginDictionary();
        factory.beginDictionaryItem("Annots");
        if (!annots.empty())
        {
            factory << annots;
        }
        else
        {
            factory << PDFObject();
        }
        factory.endDictionaryItem();
        factory.endDictionary();

        mergeTo(page, factory.takeObject());
    }

    setObject(annotation, PDFObject());
}

PDF4QT can be found here: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJakubMelka%2FPDF4QT&data=05%7C01%7Clrosenth%40adobe.com%7C20fe244e683442cbddc008da9d665505%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637995360231784319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NMLVc%2Fbwtyjm0UxtXlqtIEs9eaBU%2BO%2F%2FNaCevw%2F%2Bz8E%3D&reserved=0

What can we do to solve this? I think we should mimic how PDF4QT does
it. What do you think?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20220923/9f208dfe/attachment.htm>


More information about the poppler mailing list