[poppler] Why poppler, which supports tagged PDFs, doesn't recognize some of the tags as a whole?

Leonard Rosenthol lrosenth at adobe.com
Thu Jun 24 14:21:41 UTC 2021


I believe the issue is that Evince, when doing copy/paste of text does *not* look at the Tags but instead just uses the content stream (via TextOutputDev or equivalent).

Leonard

From: poppler <poppler-bounces at lists.freedesktop.org> on behalf of Germán Poo-Caamaño <gpoo at gnome.org>
Date: Thursday, June 24, 2021 at 9:49 AM
To: poppler at lists.freedesktop.org <poppler at lists.freedesktop.org>
Subject: Re: [poppler] Why poppler, which supports tagged PDFs, doesn't recognize some of the tags as a whole?
On Thu, 2021-06-24 at 10:44 +0200, Albert Astals Cid wrote:
> El dijous, 24 de juny de 2021, a les 7:48:45 (CEST), Denis Bitouzé va
> escriure:
> > Hi,
> >
> > the joined `test.pdf` file is properly tagged as you can check it
> > by
> > loading it at:
> >
> >   ┌────
> >   │ https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ngpdf.com%2FloadFile&data=04%7C01%7Clrosenth%40adobe.com%7Cef542415b4b744bbb43608d93716a6ef%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637601393451358131%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=B2PyymiLJry0%2FuTtFq%2FxKTxv%2Fm7IyCSvskMpv6vyYBI%3D&reserved=0
> >   └────
> >
> > and then looking at:
> >
> >   ┌────
> >   │ https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ngpdf.com%2Feditor%2FeditFile&data=04%7C01%7Clrosenth%40adobe.com%7Cef542415b4b744bbb43608d93716a6ef%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637601393451358131%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=A5pj0HN%2BIXhlLKJXqHrGPOJiqOa2ecEP4oeTUf6DBLY%3D&reserved=0
> >   └────
> >
> > You can see each line of the code:
> >
> >   ┌────
> >   │ \pdfdict_new:n   {l_my_action_dict}
> >   │ \pdfdict_put:nnn {l_my_action_dict}{Type}{/Action}
> >   │ \pdfdict_put:nnn {l_my_action_dict}{S}{/URI}
> >   │ \pdfdict_put:nnn {l_my_action_dict}{URI}{(
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.latex-project.org%2F&data=04%7C01%7Clrosenth%40adobe.com%7Cef542415b4b744bbb43608d93716a6ef%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637601393451358131%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=sYehJE%2F61AvsbjEGCLNUrEzGObcp14gQVjix%2Beo11Bg%3D&reserved=0)}<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.latex-project.org%2F&data=04%7C01%7Clrosenth%40adobe.com%7Cef542415b4b744bbb43608d93716a6ef%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637601393451358131%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=sYehJE%2F61AvsbjEGCLNUrEzGObcp14gQVjix%2Beo11Bg%3D&reserved=0)%7d>
> >   └────
> >
> > is a single tag.
> >
> > Nevertheless this code, if copied e.g. from Evince 3.38.1, is
> > pasted not
> > as it is and but as:
>
> That would be a question for the Evince developers (some of them are
> on this is i guess so you may still get an answer).
>
> The fact that poppler has facilities to "see" the contents of tagged
> pdf doesn't mean that evince is using them.

I am unsure what the report or question is about. Is it about
presenting/seeing each tag separately or copying/pasting the test in
the tags?

If the later, that corresponds to poppler-glib.

--
Germán Poo-Caamaño
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcalcifer.org%2F&data=04%7C01%7Clrosenth%40adobe.com%7Cef542415b4b744bbb43608d93716a6ef%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637601393451358131%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=rCag2kD3rZXh0xRFfGTWNGzL1jKmN8hwXN427188BcE%3D&reserved=0



_______________________________________________
poppler mailing list
poppler at lists.freedesktop.org
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fpoppler&data=04%7C01%7Clrosenth%40adobe.com%7Cef542415b4b744bbb43608d93716a6ef%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637601393451358131%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=6PjUtzy8Ks8uTdz1RZu3ctqH5eeHzsnxQf%2BgBBcBhLI%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20210624/f89ee310/attachment-0001.htm>


More information about the poppler mailing list