<html><body><div style="font-family: times new roman, new york, times, serif; font-size: 12pt; color: #000000"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Mar 28, 2017 at 5:14 PM, Frediano Ziglio <span dir="ltr"><<a href="mailto:fziglio@redhat.com" target="_blank">fziglio@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">><br> > The main goal is to reduce time in GDI callback (PresentDisplayOnly) and<br> > avoid<br> > situation when the processing takes more than 2 seconds causing class driver<br> > watchdog.<br> ><br> > 1. We offload sending of drawable commands to separate thread (waiting for<br> > room in command ring<br> > may take unpredictable time)<br> > 2. In case the usage of device memory is high, allocation of bitmap for<br> > rectangle to draw<br> > also may take unpredictable time (note that single full screen redraw<br> > requires >3 MB of space)<br> > So, we make drawable objects allocation from GDI callaback fast and<br> > non-forced and in case they<br> > fail we provide alternate allocation from OS heap<br> > 3. The thread before send drawable command shall take care on these objects<br> > that was allocated from<br> > OS heap and allocate them from device memory (now we are not limited by<br> > time)<br> > 4. We still do not enable VSync automatically, but this can be done for<br> > evaluation/testing purpose via<br> > setting in the driver's registry<br> ><br> <br> </span>A big issue of this approach is that it does not entirely solve<br> the problem but move it.<br></blockquote><div><br></div><div>We can't spend too much time waiting for memory in OS callback.</div><div>In our own thread our wait can be as long as we want.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Instead of waiting for device memory we fallback to system one<br> and when we can send commands we copy back to device memory and<br> send it increasing system memory usage and memory copies.<br></blockquote><div><br></div><div>Yes, that's correct. Our processing in OS callback must be fast and I do not see how we can</div><div>solve it without using host memory and without skipping operation.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> However I cannot see any limitation so potentially we'll fill<br> system memory till the guest crash. And if we add a limitation<br> potentially this will just move the hang to later.<br></blockquote><div><br></div><div>We allocate pageable memory which is much less limited than non-pageable </div><div>and typical amount of available pageable memory is > 1G</div><div>When working in LAN environment, there is rare cases when we need to allocate host memory.</div><div>With long end-to-end delay under heavy scenarios I did not see huge amount of outstanding allocation.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> As far as I know we always have available 3 times the amount<br> of memory of the maximum frame buffer to in theory plenty of<br> space. But trying to see the drawing from the client I can see<br> lot of redrawing of the same area again and again so maybe<br> this is causing the issues with the memory.<br> Maybe we can find a smarter way to solve this memory issue?<br></blockquote><div><br></div><div>I would suggest to look for possible improvements later.</div><div>I have some ideas but they do not invalidate current solution.</div></div></div></div></blockquote><div>I was talking with Jonathon about different memory layout of different drivers.<br></div><div>Turns out that this "new" DOD driver uses a different layout from previous Windows<br></div><div>driver. Exactly Bar0 (DEVRAM) is used for frame buffer and monitor configs<br></div><div>while everything else is in VRAM. Previous Windows drivers used VRAM only<br></div><div>for off screen surfaces (so basically was always using DEVRAM). But according to<br></div><div>our data and from <a href="http://www.ovirt.org/documentation/draft/video-ram/">http://www.ovirt.org/documentation/draft/video-ram/</a> the allocation<br></div><div>for VRAM can be really small (8Mb) which is not good for WDDM driver.<br></div><div><br></div><div>Time ago for Linux I proposed a patch that basically has a kind of fallback for memory<br></div><div>allocations. If it failed allocating on one bar the other was tried (and deallocations of course<br></div><div>detected the bar used based on the pointer). I think would make sense to try such<br></div><div>a strategy even to make guest system upgrades easier. Writing a patch.<br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br> > Yuri Benditovich (12):<br> > qxl-wddm-dod: Prepare system thread for rendering<br> > qxl-wddm-dod: Use rendering offload thread<br> > qxl-wddm-dod: Introduce TimeMeasurement class for timing debugging<br> > qxl-wddm-dod: Debug warning on long wait on event<br> > qxl-wddm-dod: Reduce amount of unnecessary printouts<br> > qxl-wddm-dod: Registry-based control over VSync<br> > qxl-wddm-dod: Set VSync indication period to 200ms<br> > qxl-wddm-dod: Prepare for failure to allocate memory<br> > qxl-wddm-dod: PutBytesAlign supports non-forced allocation<br> > qxl-wddm-dod: Optimize allocation of memory chunks<br> > qxl-wddm-dod: Implement non-forced bitmap allocation<br> > qxl-wddm-dod: Non-forced memory allocations with VSync<br> ><br> > qxldod/QxlDod.cpp | 581<br> > +++++++++++++++++++++++++++++++++++++++++++++---------<br> > qxldod/QxlDod.h | 87 +++++++-<br> > qxldod/driver.cpp | 35 ++++<br> > 3 files changed, 606 insertions(+), 97 deletions(-)<br> ><br></div></div></blockquote></div></div></div></blockquote><div>Frediano</div><div><br></div></div></body></html>