KFD event handling questions

Thangirala, Hari Hari.Thangirala at amd.com
Sat Sep 23 00:44:22 UTC 2017


Hi Felix,

Event IDs reserved for debug events was an artifact of a debugger design. We changed our tools strategy so those limitations do not exist anymore. There's no need to reserve special event-IDs for debugger. 4096 is good enough and the new debug trap handler will use events from that pool.

Thanks
Hari



-----Original Message-----
From: Kuehling, Felix 
Sent: Thursday, September 21, 2017 9:36 PM
To: amd-gfx at lists.freedesktop.org; Oded Gabbay <oded.gabbay at gmail.com>; Thangirala, Hari <Hari.Thangirala at amd.com>
Subject: KFD event handling questions

[Hari, can you comment on the last paragraph below regarding the events limit? Would it hurt the runtime much to lose 8 signals out of 4096?]

Hi Oded,

I'm looking into upstreaming some event handling enhancements. I see that you actually worked on increasing the number of events to 4096 and working around some problems with debug events when you were at AMD.

As I'm looking at this, I'm confused by the code that allocates signal pages. It allocates a single signal page that's big enough to accommodate all signals. But allocate_signal_page looks like it was meant to allocate smaller signal pages incrementally that are accumulated in a kfd_process::signal_event_pages list. However, this feature never gets used. It would also break user mode because the Thunk expects to be able to map a single signal page for all its signals.

Therefore I'm inclined to simplify allocate_signal_page and related code to only deal with a single allocation. Do you have any concerns or objections about that?

Somewhat related to that, you also added a workaround that increases the signal number to 4096+512 to accommodate 8 debug events while maintaining page alignment. However, the signal page is allocated with __get_free_pages, which allocates in powers of 2. So in fact it needs to allocate 8192 signals, wasting about 28KB per process. It's not a lot of waste. But an alternative would be to instead sacrifice 8 user signals.
The runtime has a fallback for when it runs out of signals, and the difference between 4096 and 4088 seems insignificant. What do you think?

Regards,
  Felix

-- 
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Vertical Workstation/Compute
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
   _     _   _   _____   _____
  / \   | \ / | |  _  \  \ _  |
 / A \  | \M/ | | |D) )  /|_| |
/_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com



More information about the amd-gfx mailing list