[PATCH v6 12/12] x86/traps: Fix up invalid PASID

Mon Aug 3 15:19:21 UTC 2020

On Mon, Aug 03, 2020 at 08:12:18AM -0700, Andy Lutomirski wrote:
> On Mon, Aug 3, 2020 at 8:03 AM Dave Hansen <dave.hansen at intel.com> wrote:
> >
> > On 7/31/20 4:34 PM, Andy Lutomirski wrote:
> > >> Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
> > >> without a valid PASID value programmed. #GP error codes are 16 bits and all
> > >> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
> > >> choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
> > >> when loading from the source operand, so its not fully comprehending all
> > >> the reasons. Rather than special case the ENQCMD, in future Intel may
> > >> choose a different fault mechanism for such cases if recovery is needed on
> > >> #GP.
> > > Decoding the user instruction is ugly and sets a bad architecture
> > > precedent, but we already do it in #GP for UMIP.  So I'm unconvinced.
> >
> > I'll try to do one more bit of convincing. :)
> >
> > In the end, we need a way to figure out if the #GP was from a known "OK"
> > source that we can fix up.  You're right that we could fire up the
> > instruction decoder to help answer that question.  But, it (also)
> > doesn't easily yield a perfect answer as to the source of the #GP, it
> > always involves a user copy, and it's a larger code impact than what
> > we've got.
> >
> > I think I went and looked at fixup_umip_exception(), and compared it to
> > the alternative which is essentially just these three lines of code:
> >
> > > +     /*
> > > +      * If the current task already has a valid PASID in the MSR,
> > > +      * the #GP must be for some other reason.
> > > +      */
> > > +     if (current->has_valid_pasid)
> > > +             return false;
> > ...> +  /* Now the current task has a valid PASID in the MSR. */
> > > +     current->has_valid_pasid = 1;
> >
> > and *I* was convinced that instruction decoding wasn't worth it.
> >
> > There's a lot of stuff that fixup_umip_exception() does which we don't
> > have to duplicate, but it's going to be really hard to get it anywhere
> > near as compact as what we've got.
> >
> 
> I could easily be convinced that the PASID fixup is so trivial and so
> obviously free of misfiring in a way that causes an infinite loop that
> this code is fine.  But I think we first need to answer the bigger
> question of why we're doing a lazy fixup in the first place.

We choose lazy fixup for 2 reasons. 

- If some threads were already created before the MSR is programmed, then
  we need to fixup those in a race free way. scheduling some task-work etc.
  We did do that early on, but decided it was ugly. 
- Not all threads need to submit ENQCMD, force feeding the MSR probably
  isn't even required for all. Yes the overhead isn't probably big, but
  might not even be required for all threads.

We needed to fixup MSR in two different way. To keep the code simple, the
choice was to only fixup on #GP, that eliminated the extra code we need to
support case1.

Cheers,
Ashok