<html><body><div style="font-family: times new roman, new york, times, serif; font-size: 12pt; color: #000000"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Mar 28, 2017 at 5:14 PM, Frediano Ziglio <span dir="ltr"><<a href="mailto:fziglio@redhat.com" target="_blank">fziglio@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">><br>
> The main goal is to reduce time in GDI callback (PresentDisplayOnly) and<br>
> avoid<br>
> situation when the processing takes more than 2 seconds causing class driver<br>
> watchdog.<br>
><br>
> 1. We offload sending of drawable commands to separate thread (waiting for<br>
> room in command ring<br>
> may take unpredictable time)<br>
> 2. In case the usage of device memory is high, allocation of bitmap for<br>
> rectangle to draw<br>
> also may take unpredictable time (note that single full screen redraw<br>
> requires >3 MB of space)<br>
> So, we make drawable objects allocation from GDI callaback fast and<br>
> non-forced and in case they<br>
> fail we provide alternate allocation from OS heap<br>
> 3. The thread before send drawable command shall take care on these objects<br>
> that was allocated from<br>
> OS heap and allocate them from device memory (now we are not limited by<br>
> time)<br>
> 4. We still do not enable VSync automatically, but this can be done for<br>
> evaluation/testing purpose via<br>
> setting in the driver's registry<br>
><br>
<br>
</span>A big issue of this approach is that it does not entirely solve<br>
the problem but move it.<br></blockquote><div><br></div><div>We can't spend too much time waiting for memory in OS callback.</div><div>In our own thread our wait can be as long as we want.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Instead of waiting for device memory we fallback to system one<br>
and when we can send commands we copy back to device memory and<br>
send it increasing system memory usage and memory copies.<br></blockquote><div><br></div><div>Yes, that's correct. Our processing in OS callback must be fast and I do not see how we can</div><div>solve it without using host memory and without skipping operation.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
However I cannot see any limitation so potentially we'll fill<br>
system memory till the guest crash. And if we add a limitation<br>
potentially this will just move the hang to later.<br></blockquote><div><br></div><div>We allocate pageable memory which is much less limited than non-pageable </div><div>and typical amount of available pageable memory is > 1G</div><div>When working in LAN environment, there is rare cases when we need to allocate host memory.</div><div>With long end-to-end delay under heavy scenarios I did not see huge amount of outstanding allocation.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
As far as I know we always have available 3 times the amount<br>
of memory of the maximum frame buffer to in theory plenty of<br>
space. But trying to see the drawing from the client I can see<br>
lot of redrawing of the same area again and again so maybe<br>
this is causing the issues with the memory.<br>
Maybe we can find a smarter way to solve this memory issue?<br></blockquote><div><br></div><div>I would suggest to look for possible improvements later.</div><div>I have some ideas but they do not invalidate current solution.</div></div></div></div></blockquote><div>I was talking with Jonathon about different memory layout of different drivers.<br></div><div>Turns out that this "new" DOD driver uses a different layout from previous Windows<br></div><div>driver. Exactly Bar0 (DEVRAM) is used for frame buffer and monitor configs<br></div><div>while everything else is in VRAM. Previous Windows drivers used VRAM only<br></div><div>for off screen surfaces (so basically was always using DEVRAM). But according to<br></div><div>our data and from <a href="http://www.ovirt.org/documentation/draft/video-ram/">http://www.ovirt.org/documentation/draft/video-ram/</a> the allocation<br></div><div>for VRAM can be really small (8Mb) which is not good for WDDM driver.<br></div><div><br></div><div>Time ago for Linux I proposed a patch that basically has a kind of fallback for memory<br></div><div>allocations. If it failed allocating on one bar the other was tried (and deallocations of course<br></div><div>detected the bar used based on the pointer). I think would make sense to try such<br></div><div>a strategy even to make guest system upgrades easier. Writing a patch.<br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>
> Yuri Benditovich (12):<br>
> qxl-wddm-dod: Prepare system thread for rendering<br>
> qxl-wddm-dod: Use rendering offload thread<br>
> qxl-wddm-dod: Introduce TimeMeasurement class for timing debugging<br>
> qxl-wddm-dod: Debug warning on long wait on event<br>
> qxl-wddm-dod: Reduce amount of unnecessary printouts<br>
> qxl-wddm-dod: Registry-based control over VSync<br>
> qxl-wddm-dod: Set VSync indication period to 200ms<br>
> qxl-wddm-dod: Prepare for failure to allocate memory<br>
> qxl-wddm-dod: PutBytesAlign supports non-forced allocation<br>
> qxl-wddm-dod: Optimize allocation of memory chunks<br>
> qxl-wddm-dod: Implement non-forced bitmap allocation<br>
> qxl-wddm-dod: Non-forced memory allocations with VSync<br>
><br>
> qxldod/QxlDod.cpp | 581<br>
> +++++++++++++++++++++++++++++++++++++++++++++---------<br>
> qxldod/QxlDod.h | 87 +++++++-<br>
> qxldod/driver.cpp | 35 ++++<br>
> 3 files changed, 606 insertions(+), 97 deletions(-)<br>
><br></div></div></blockquote></div></div></div></blockquote><div>Frediano</div><div><br></div></div></body></html>