[Intel-gfx] [RFC] GPU reset notification interface

Ian Romanick idr at freedesktop.org
Wed Jul 18 03:57:30 CEST 2012


On 07/17/2012 03:16 PM, Ian Romanick wrote:
> I'm getting ready to implement the reset notification part of
> GL_ARB_robustness in the i965 driver.  There are a bunch of quirky bits
> of the extension that are causing some grief in designing the kernel /
> user interface.  I think I've settled on an interface that should meet
> all of the requirements, but I want to bounce it off people before I
> start writing code.
>
> Here's the list of requirements.
>
> - Applications poll for reset status.
>
> - Contexts that did not lose data or rendering should not receive a
> reset notification.  This isn't strictly a requirement of the spec, but
> it seems like a good practice.  Once an application receives a reset
> notification for a context, it is supposed to destroy that context and
> start over.
>
> - If one context in an OpenGL share group receives a reset notification,
> all contexts in that share group must receive a reset notification.
>
> - All contexts in a single GL share group will have the same fd.  This
> isn't a requirement so much as a simplifying assumption.  All contexts
> in a share group have to be in the same address space, so I will assume
> that means they're all controlled by one DRI driver instance with a
> single fd.
>
> - The reset notification given to the application should try to assign
> guilt.  There are three values possible: unknown guilt, you're not
> guilty, or you are guilty.
>
> - If there are multiple resets between polls, the application should get
> the "most guilty" answer.  In other words, if there are two resets and
> the context was guilty for one and not the other, the application should
> get the guilty notification.
>
> - After the application polls the status, the status should revert to
> "no reset occurred."
>
> - If the application polls the status and the reset operation is still
> in progress, it should continue to get the reset value until it is
> "safe" to begin issuing GL commands again.
>
> At some point I'd like to extend this to give slightly finer grained
> mechanism so that a context could be told that everything after a
> particular GL sync (fence) operation was lost.  This could prevent some
> applications from having to destroy and rebuild their context.  This
> isn't a requirement, but it's an idea that I've been mulling.
>
> Each time a reset occurs, an internal count is incremented.  This
> associates a unique integer, starting with 1, with each reset event.
> Each context affected by the reset will have the reset event ID stored
> in one its three guilt levels.  An ioctl will be provided that returns
> the following data for all contexts associated with a particular fd.
>
> In addition, it will return the index of any reset operation that is
> still in progress.
>
> I think this should be sufficient information for user space to meet all
> of the requirements.  I had a conversation with Paul and Ken about this.
>   Paul was pretty happy with the idea.  Ken felt like too little policy
> was in the kernel, and the resulting interface was too heavy (I'm
> paraphrasing).
>
> struct drm_context_reset_counts {

Some of the Radeon guys on #dri-devel already told me these are the 
wrong prefixes for something that's not a shared DRM interface.  I guess 
drm_i915_gem is the correct prefix?  It's been a long time since my last 
kernel work. :)

>      __u32 ctx_id;
>
>      /**
>           * Index of the most recent reset where this context was
>       * guilty.  Zero if none.
>           */
>      __u32 guilty;
>
>      /**
>           * Index of the most recent reset where this context was
>       * not guilty.  Zero if none.
>           */
>      __u32 not_guilty;
>
>      /**
>           * Index of the most recent reset where guilt was unknown.
>       * Zero if none.
>           */
>      __u32 unknown_guilt;
> };
>
> struct drm_reset_counts {
>      /** Index of the in-progress reset.  Zero if none. */
>      unsigned reset_index_in_progress;
>
>      /** Number of contexts. */
>      __u32 num_contexts;
>
>      struct drm_context_reset_counts contexts[0];
> };
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx



More information about the Intel-gfx mailing list