<div dir="ltr"><div>The Gallium HUD doesn't consume strings. It only consumes values that are exposed as counters from the driver. In this case, we need the driver to expose evicted stats as counters. Each counter can set whether the value is absolute (e.g. memory usage) or monotonic (e.g. perf counter). Parsing fdinfo to get the values is undesirable.<br></div><div><br></div><div>Marek<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 23, 2023 at 4:31 AM Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com">ckoenig.leichtzumerken@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    Let's do this as valid in fdinfo.<br>
    <br>
    This way we can easily extend whatever the kernel wants to display
    as statistics in the userspace HUD.<br>
    <br>
    Regards,<br>
    Christian.<br>
    <br>
    <div>Am 21.01.23 um 01:45 schrieb Marek
      Olšák:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>We badly need a way to query evicted memory usage. It's
          essential for investigating performance problems and it
          uncovered the buddy allocator disaster. Please either suggest
          an alternative, suggest changes, or review. We need it ASAP.<br>
        </div>
        <div><br>
        </div>
        <div>Thanks,</div>
        <div>Marek<br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Tue, Jan 10, 2023 at 11:55
          AM Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div dir="ltr">
            <div class="gmail_quote">
              <div dir="ltr" class="gmail_attr">On Tue, Jan 10, 2023 at
                11:23 AM Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" target="_blank">ckoenig.leichtzumerken@gmail.com</a>>
                wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                <div> Am 10.01.23 um 16:28 schrieb Marek Olšák:<br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_quote">
                        <div dir="ltr" class="gmail_attr">On Wed, Jan 4,
                          2023 at 9:51 AM Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" target="_blank">ckoenig.leichtzumerken@gmail.com</a>>
                          wrote:<br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div> Am 04.01.23 um 00:08 schrieb Marek
                            Olšák:<br>
                            <blockquote type="cite">
                              <div dir="ltr">
                                <div>I see about the access now, but did
                                  you even look at the patch?</div>
                              </div>
                            </blockquote>
                            <br>
                            I did look at the patch, but I haven't fully
                            understood yet what you are trying to do
                            here.<br>
                          </div>
                        </blockquote>
                        <div><br>
                        </div>
                        <div>First and foremost, it returns the evicted
                          size of VRAM and visible VRAM, and returns
                          visible VRAM usage. It should be obvious which
                          stat includes the size of another.<br>
                        </div>
                        <div><br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div> <br>
                            <blockquote type="cite">
                              <div dir="ltr">
                                <div> Because what the patch does isn't
                                  even exposed to common drm code, such
                                  as the preferred domain and visible
                                  VRAM placement, so it can't be in
                                  fdinfo right now.<br>
                                </div>
                                <div><br>
                                </div>
                                <div>Or do you even know what fdinfo
                                  contains? Because it contains nothing
                                  useful. It only has VRAM and GTT
                                  usage, which we already have in the
                                  INFO ioctl, so it has nothing that we
                                  need. We mainly need the eviction
                                  information and visible VRAM
                                  information now. Everything else is a
                                  bonus.<br>
                                </div>
                              </div>
                            </blockquote>
                            <br>
                            Well the main question is what are you
                            trying to get from that information? The
                            eviction list for example is completely
                            meaningless to userspace, that stuff is only
                            temporary and will be cleared on the next CS
                            again.<br>
                          </div>
                        </blockquote>
                        <div><br>
                        </div>
                        <div>I don't know what you mean. The returned
                          eviction stats look correct and are stable
                          (they don't change much). You can suggest
                          changes if you think some numbers are not
                          reported correctly.<br>
                        </div>
                        <div> </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div> <br>
                            What we could expose is the VRAM over-commit
                            value, e.g. how much BOs which where
                            supposed to be in VRAM are in GTT now. I
                            think that's what you are looking for here,
                            right?<br>
                          </div>
                        </blockquote>
                        <div><br>
                        </div>
                        <div>The VRAM overcommit value is
                          "evicted_vram".<br>
                        </div>
                        <div> </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div> <br>
                            <blockquote type="cite">
                              <div dir="ltr">
                                <div>
                                  <div>Also, it's undesirable to open
                                    and parse a text file if we can just
                                    call an ioctl.</div>
                                </div>
                              </div>
                            </blockquote>
                            <br>
                            Well I see the reasoning for that, but I
                            also see why other drivers do a lot of the
                            stuff we have as IOCTL as separate files in
                            sysfs, fdinfo or debugfs.<br>
                            <br>
                            Especially repeating all the static
                            information which were already available
                            under sysfs in the INFO IOCTL was a design
                            mistake as far as I can see. Just compare
                            what AMDGPU and the KFD code is doing to
                            what for example i915 is doing.<br>
                            <br>
                            Same for things like debug information about
                            a process. The fdinfo stuff can be queried
                            from external tools (gdb, gputop, umr
                            etc...) as well which makes that interface
                            more preferred.<br>
                          </div>
                        </blockquote>
                        <div><br>
                        </div>
                        <div>Nothing uses fdinfo in Mesa. No driver uses
                          sysfs in Mesa except drm shims, noop drivers,
                          and Intel for perf metrics. sysfs itself is an
                          unusable mess for the PCIe query and is
                          missing information.</div>
                        <div><br>
                        </div>
                        <div>I'm not against exposing more stuff through
                          sysfs and fdinfo for tools, but I don't see
                          any reason why drivers should use it (other
                          than for slowing down queries and
                          initialization).</div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                  That's what I'm asking: Is this for some tool or to
                  make some driver decision based on it?<br>
                  <br>
                  If you just want the numbers for over displaying then
                  I think it would be better to put this into fdinfo
                  together with the other existing stuff there.<br>
                </div>
              </blockquote>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                <div> <br>
                  If you want to make allocation decisions based on this
                  then we should have that as IOCTL or even better as
                  mmap() page between kernel and userspace. But in this
                  case I would also calculation the numbers completely
                  different as well.<br>
                  <br>
                  See we have at least the following things in the
                  kernel:<br>
                  1. The eviction list in the VM.<br>
                      Those are the BOs which are currently evicted and
                  tried to moved back in on the next CS.<br>
                  <br>
                  2. The VRAM over commit value.<br>
                      In other words how much more VRAM than available
                  has the application tried to allocate?<br>
                  <br>
                  3. The visible VRAM usage by this application.<br>
                  <br>
                  The end goal is that the eviction list will go away,
                  e.g. we will always have stable allocations based on
                  allocations of other applications and not constantly
                  swap things in and out.<br>
                  <br>
                  When you now expose the eviction list to userspace we
                  will be stuck with this interface forever.<br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>It's for the GALLIUM HUD.</div>
              <div><br>
              </div>
              <div>The only missing thing is the size of all evicted
                VRAM allocations, and the size of all evicted visible
                VRAM allocations.<br>
              </div>
              <div><br>
              </div>
              <div>1. No list is exposed. Only sums of buffer sizes are
                exposed. Also, the eviction list has no meaning here.
                All lists are treated equally, and mem_type is compared
                with preferred_domains to determine where buffers are
                and where they should be.<br>
              </div>
              <div><br>
              </div>
              <div>2. I'm not interested in the overcommit value. I'm
                only interested in knowing the number of bytes of
                evicted VRAM right now. It can be as variable as the CPU
                load, but in practice it shouldn't be because PCIe
                doesn't have the bandwidth to move things quickly.<br>
              </div>
              <div><br>
              </div>
              <div>3. Yes, that's true.</div>
              <div><br>
              </div>
              <div>Marek</div>
              <br>
            </div>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <br>
  </div>

</blockquote></div>