[a11y] LibreOffice Calc exposes 2^31 children, freezes on `GetChildren`

Tue Jun 11 07:55:47 UTC 2024

Hi,

thanks for sharing your thoughts!

On 2024-06-10 16:37, Michael Meeks wrote:
>      Let me add my 2 cents; a spreadsheet can have 10^20 rows, and 2^14 
> columns - that's 34 bits already - so even in the case that you thought 
> you wanted to iterate them all over a remote bus - you don't.

True.

>      Indeed - the whole idea is madness; there was this 
> manages-descendants hack in the past to try to tag such widgets to avoid 
> iterating them.
>      My attempts to encourage people to expose only the visible (or near 
> visible) items in the past were not that productive; but I still firmly 
> believe this is the only sensible way to do this.

Large tables are actually the example that the AT-SPI MANAGES_DESCENDANT 
doc currently uses [1]:

"Used to prevent need to enumerate all children in very large 
containers, like tables. The presence of 
Atspi.StateType.MANAGES_DESCENDANTS is an indication to the client that 
the children should not, and need not, be enumerated by the client."

>      Best of luck with this; I would really recommend that we focus on 
> exposing only the data that is either visible - or better close to 
> visible (ie. within a page-up/page-down / etc. around the document), 
> with perhaps an extension of peers for eg. interesting headings in the 
> document so these can be cached and enumerated (ie. what you see in the 
> navigator).

Limiting children to the to "close to visible" cells sounds like a 
potential approach.

However, that would IMHO still need a clear specification on how to 
implement it and how all relevant AT use cases are covered.

Some aspects/questions that might need some further consideration:

* How do other interfaces (like AT-SPI Table, TableCell and Selection) 
expose information? Does e.g. the table report it only has 50 rows and 
30 columns if that's what's visible on screen? Does cell Q227 report a 
row and column index of 0 if it's the first one in the visible area?

* In some cases, off-screen children are of interest, e.g. if they are 
contained in the current selection. How should that be handled? (e.g. 
how does the screen reader announce something like "cell A1 to C100 
selected" if cell A1 "doesn't exist" because it's off-screen?

* Exposing and caching all cells based on visibility means that whenever 
the view port changes, this needs to be actively updated (push 
approach), which comes with a cost (that I can't estimate right now).
(We currently have that for other modules, see e.g. comment [3] for 
Impress.)

* How do screen readers implement features like "read the whole row"? Do 
they just read the part of the row that's currently visible on screen 
and leave out the rest? Or do they somehow implement some extra logic to 
retrieve the remaining content?

* Is navigating to an "arbitrary" cell still possible via a11y API, e.g. 
if some screen reader specific table navigation command implements "jump 
to the first/last cell in the table" or "select the current row")?

As mentioned earlier, the discussion in GTK issue [2] provides some 
valuable insights and ideas, but doesn't answer all questions yet, and 
there are likely more when looking further into the details.

> though of course it is then ideal to have some nice navigation API support wrapped around that

What kind of API does that refer to? Existing or new API on the platform 
a11y level that LO (or the toolkits it uses) would then implement, or 
something else? Do you have anything particular in mind?

>      Oddly, Writer - which could prolly cope rather better with exposing 
> all paragraphs set out by cropping to the visible content, whereas Calc 
> where this was always a silly idea tried to expose everything ;-)

That's indeed unfortunate...

I've been told repeatedly that the fact that Writer doesn't expose 
off-screen document content is indeed a problem as it breaks features 
like browse mode/document navigation in NVDA or Orca (see e.g. 
tdf#35652, tdf#137955, tdf#91739, tdf#96492).

Exposing off-screen Writer document content is actually something I plan 
to look into at some point. My idea so far is to also expose pages on 
the a11y level, which should avoid the problem of a single object (the 
document) having an enormous amount of children due to that.
If there any general concerns about that, please raise them. :-)

The feedback I've received from a11y experts so far is that off-screen 
doc content should *generally* be exposed on the a11y level, and 
limiting Calc to not do that with its huge amount of table cells is 
meant to be an exception to the rule in that regard (see e.g. the 
discussion in [2] and tdf#156657).

I think it's fair to treat that specially, but (repeating myself here) 
my take is it needs clarity on what's the "correct" way to do that, and 
that's something that would IMHO ideally be clearly specified by AT 
and/or a11y protocol developers in a general guideline that app 
developers can cling to, rather than LO inventing something by itself.

If anyone has further thoughts on that, please don't hesitate to share 
them! :-)

[1] 
https://lazka.github.io/pgi-docs/Atspi-2.0/enums.html#Atspi.StateType.MANAGES_DESCENDANTS
[2] https://gitlab.gnome.org/GNOME/gtk/-/issues/6204
[3] 
https://gerrit.libreoffice.org/c/core/+/137622/comments/c5f34b0f_c47a1b82
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20240611/6a3a0a11/attachment.sig>