[a11y] LibreOffice Calc exposes 2^31 children, freezes on `GetChildren`

Tue Jun 11 09:40:23 UTC 2024

Hi Michael,

	Some great questions and points here; forgive my intruding into the space again =)

On 11/06/2024 08:55, Michael Weghorn wrote:
> Limiting children to the to "close to visible" cells sounds like a 
> potential approach.

	For a performant, caching AT I would argue this is arguably the only feasible option =) But ... the navigation API I mentioned is important.

> However, that would IMHO still need a clear specification on how to 
> implement it and how all relevant AT use cases are covered.

	Agreed =)

> Some aspects/questions that might need some further consideration:
> 
> * How do other interfaces (like AT-SPI Table, TableCell and Selection) 
> expose information? Does e.g. the table report it only has 50 rows and 
> 30 columns if that's what's visible on screen? Does cell Q227 report a 
> row and column index of 0 if it's the first one in the visible area?

	I think exposing the whole thing through the Table interface may make some sense; it is clearly a crazy set of cells - and think it's reasonable to blame ATs if they use this interface for doing something silly.

> * In some cases, off-screen children are of interest, e.g. if they are 
> contained in the current selection. How should that be handled? (e.g. 
> how does the screen reader announce something like "cell A1 to C100 
> selected" if cell A1 "doesn't exist" because it's off-screen?

	Right; so - I mentioned "near to the screen" - by near; I mean we will probably want a number of things that are navigationally close: eg. "next heading" or somesuch - to lurk around as real & tracked peers. The content of the Navigator headings should prolly always be present in a writer document's object hierarcy IMHO. That should let ATs very quickly enumerate headings, jump focus to them with a simple API etc.

> * Exposing and caching all cells based on visibility means that whenever 
> the view port changes, this needs to be actively updated (push 
> approach), which comes with a cost (that I can't estimate right now).
> (We currently have that for other modules, see e.g. comment [3] for 

	That is the case; we do that for writer on page-down of course.

> * How do screen readers implement features like "read the whole row"? 

	This comes down to the navigation API I mentioned: having a good API to allow continuous screen-reading of large data-sets - with caching pre-loading & fetching along eg. a selection is really useful. Current writer behavior is far from optimal since you need to do something odd to get a simple navigation such as "next page" IIRC.

> Do 
> they just read the part of the row that's currently visible on screen 
> and leave out the rest? Or do they somehow implement some extra logic
> to  retrieve the remaining content?

	Extra navigation / enumeration logic I think.

> * Is navigating to an "arbitrary" cell still possible via a11y API, e.g. 
> if some screen reader specific table navigation command implements "jump 
> to the first/last cell in the table" or "select the current row")?

	There should be a widget for this at the top of calc to jump to and select cells - if I were customizing an AT I'd use that.

>> though of course it is then ideal to have some nice navigation API 
>> support wrapped around that
> 
> What kind of API does that refer to? Existing or new API on the
> platform a11y level that LO (or the toolkits it uses) would then
> implement, or something else? Do you have anything particular in mind?

	I was actually somewhat optimistic about the UIA API Navigation API conceptually:

https://learn.microsoft.com/en-us/windows/win32/api/uiautomationcore/nf-uiautomationcore-irawelementproviderfragment-navigate

	Although - I'd really suggest that a11y doesn't work against the application, and if navigating - it should allow the AT to scroll the actual visible/view-port to match what is being interrogated.

> I've been told repeatedly that the fact that Writer doesn't expose 
> off-screen document content is indeed a problem as it breaks features 
> like browse mode/document navigation in NVDA or Orca (see e.g. 
> tdf#35652, tdf#137955, tdf#91739, tdf#96492).

	I'm not convinced that "can't see it in Accerssisor" is a real problem; what we all want are fast, accurate and helpful ATs for the impaired. Perhaps I mis-understand, but browse-mode seems conceptually similar to navigating by scrolling the current window down a document - without changing the cursor/document focus.

> to look into at some point. My idea so far is to also expose pages on 
> the a11y level, which should avoid the problem of a single object (the 
> document) having an enormous amount of children due to that.
> If there any general concerns about that, please raise them. :-)

	I guess this moves the problem to re-pagination; where we can get 300+ pages re-built for the sake of moving a single paragraph; then again - I guess if we are notifying changes in position on large sets of accessible peers we have a similar problem.

> The feedback I've received from a11y experts so far is that off-screen 
> doc content should *generally* be exposed on the a11y level, and 
> limiting Calc to not do that with its huge amount of table cells is 
> meant to be an exception to the rule in that regard (see e.g. the 
> discussion in [2] and tdf#156657).

	I really think that's a mistake that will ultimately hurt ATs performance and that we should focus on the end-user use-cases we want to succeed with - rather than having an abstract absolutist pre-conception that we can expose everything in an efficient way =)

> I think it's fair to treat that specially, but (repeating myself here) 
> my take is it needs clarity on what's the "correct" way to do
> that, and that's something that would IMHO ideally be clearly
> specified by AT and/or a11y protocol developers in a general
> guideline that app developers can cling to, rather than LO
> inventing something by itself.

	Sure - that's important; of course - the a11y space is traditionally tragically under-funded; so I suspect our approach is in itself somewhat influential; but whatever it is it is worth agreeing & writing down.

> If anyone has further thoughts on that, please don't hesitate to share 
> them! :-)

	Of course; I'm just one viewpoint. My strong feeling is that focusing on things that make it easier to code fast, simple ATs that meet the common use-cases people want is the vital thing; and I really think that trying to re-build arbitrary document models and synchronize them on the other end of a bus - particularly if we require lots of synchronous round-trips to interrogate the content - is not going to fly.

On 11/06/2024 09:49, Michael Weghorn wrote:
>> Otherwise, as long as the underlying platform a11y protocols are 
>> pull-based and given the input I've received up to this point, I tend 
>> to think that ATs actively querying the tree are primarily responsible 
>> for limiting that to a reasonable amount of information, but I'm 
>> thankful for any guidance here...

	Its a nice hope =) I'd want to create APIs that capture the common things that ATs want to do, make them easy, and really hard to screw up.

	But now I shut up ;-) we're working on the web side of this; caching bits in the browser and adding another protocol latency there - and I'm sure we want to be handling a reasonably bounded set of data there =)

	Regards,

		Michael.

-- 
michael.meeks at collabora.com <><, CEO Collabora Productivity
(M) +44 7795 666 147 - timezone usually UK / Europe