Base: fix incorrect field removal in criterion inside query design view

Mon Apr 22 22:50:57 PDT 2013

Hi,

I'm extremely sorry for my very late reply to this, it got buried
under a load of other stuff. Let's nail this down and get rid of the
problem.

(In future, if I'm unresponsive, feel free to prod me repeatedly) :)

On Wed, Jan 23, 2013 at 12:31:28AM +0100, Olivier Ploton wrote:
> On Mon, 21 Jan 2013 10:21:46 +0100 Lionel Elie Mamane <lionel at mamane.lu> wrote:

>>> However it fails on some (odd) examples : you cannot express an
>>> equation, see attached example. (production libo also fails with
>>> this example).

>> I'm confused. The two queries in the attached example work for me with
>> production LibO. What exactly does not work?

> Here is my production LibO: Version 3.6.2.2 (Build ID: 360m1(Build:2)) on 32 bit lubuntu 12.10, plus french localization.
> Here is the experiment in which a problem appears:
> * Open numeric.odb (I re-attached the same file)
> * Edit query named Equation (inside query view)
> * There is a criterion : "<= [number] / 2 + 10 / 2"
> * Pretend the criterion is modified (e.g. insert a space and validate)
> * The criterion rewrites into "<= / 2 + 10 / 2"
> * Run the modified query or swich to SQL view or even try to save
>   the query: you get some message like "SQL syntax error".

OK, reproduced. Actually, that example shows that "i ==
m_aChildren.begin()" is buggy, and rString.isEmpty() is better.

I've applied your patch. Thank you! (But please read on.)

> IMO There are 2 distinct questions about removing a name:
> Question (A) : is it the right place ? (a place where we may remove a name)
> Question (B) : is it the right name  ? (alone or qualified ? what about aliases ? etc.)

Exactly.

> *** About question (A) : what is a good place to remove a name :

> I first proposed to test if we are at the beginning, so CONDITION
> <=> (rString.getLength() == 0). I do not consider my proposal as a
> *solution* but as a *workaround*. Clearly, it forgets many
> simplifications,

I'd rather we simplify too little (which is a missing feature) than
too much (which is a bug). Unless someone comes up with a better test
in the future, "at beginning of string" is what we have now.

> and it seems quite robust (no need to anwser question (B), works
> fine with the mere field name),

I disagree. We still need to answer question (B), as per my example:

Field:             name
Table:             Table_2
Condition:         [Table_1].[name] = 5

gets rewritten to:

Field:             name
Table:             Table_2
Condition:         5

This is a bug. The name is at the beginning of the string, but it is
the wrong name. It should not be removed.

> So an enhancement of your CONDITION could be ((i ==
> m_aChildren.begin()) && ((this) corresponds to an operator of the
> previous list)). I did'nt try to implement it. Do you think it is
> worth trying ?

I was about to write "no, this will remove the fieldname nested deep
into expressions, which (I presume) we will fail to parse back", but
actually that is not true...

If one types:

Field:             name
Table:             Table
Condition:         [name] = 3 OR [name] = 5

this gets rewritten to

Field:             name
Table:             Table
Condition:         3 OR 5

Which works.

We "destroyed" this simplification with your patch, but that's
preferable to having the bugs like you showed in your "equation"
example.

My position is: we should strive to simplify to the simplest criterion
that we can parse back again, but no more. So the full answer lies in
the parser :) If you feel like investigating it, feel free, I take
other future patches, for example so that the above scenario is
simplified to ( 3 OR 5 ) gain, as well as cases such as ( < 3 OR 5 ).

(I'll strive to be more responsive in future.)

> *** About Question (B) : does a name refer to the right column : 

> * As you stated in one of your previous posts, we have to take into
>   account table names (which are possibly table aliases) : for a
>   criterion about field [name] in table [Table], [Table].[name] may
>   be removed but [OtherTable].[name] may not.

Right.

> * We also have to determine whether [name] alone refers to
>   [Table].[name] and may be removed or not.

In a WHERE clause (that is, in a criterion), as per SQL standard,
column names ALWAYS refer to *table* columns (that is, taken from a
table in the FROM clause), not *query* columns (that is, columns in
the SELECT clause).

So:

  SELECT "a" AS "d", "b" AS "e" FROM "Table" WHERE "e" <= 3;

is NOT Core SQL, and may give an error, because there is not column
"e" in table "Table".

I say "may" because AS AN OPTIONAL EXTENSION, a DBMS *may* allow query
columns to be referred to in the WHERE clause, but ambiguities have to
be resolved in favour of table columns, so that:

  SELECT "a" AS "b", "b" AS "a" FROM "Table" WHERE "a" <= 3;

The "a" in '"a" <= 3' refers to "Table"."a", NOT TO the query's "a",
which is "Table"."b".

> * What about a criterion inside computed columns, such as
>   UPPER([name]) LIKE '*FOO*' ?

Absolute rule: Don't remove it, unless the parser can parse it back.

Note: the parser currently cannot.

My opinion: don't remove it, it is not a desirable "simplification".

While '3' or '( 3 OR 5 )' or '( < 3 OR 5 )' are easily understandable
as "simplified" versions of:

[column] = 3
( [column] = 3 OR [column] = 5 )
( [column] < 3 OR [column] = 5 )

IMO,
 UPPER ( ) LIKE '*FOO*'
is not easily understandable as simplified version of:
 UPPER ( [column] ) LIKE '*FOO*'
so just leave it alone with the explicit column name.

> Consider a table named "Table" with (at least) 2 fields: a, b (say: integers).
> This simple query ("Simple") works fine:
>    SELECT "a", "b" FROM "Table" WHERE "a" <= 3

> Now rename field "b" into "a" using an alias. Protecting "a" by writing "Table"."a" has no effect ! and both columns are named "a" (query "Problem"):
>    SELECT "a", "b" AS "a" FROM "Table" WHERE "Table"."a" <= 3

This is a bug in HSQLDB. In your example, it should return

 a | a
---+---
 1 | 8
 3 | 6

but it returns

 a | a
---+---
 7 | 2

Just ignore it (and if you can, do your tests with another DBMS).

> Here is the only workaround I found; 2nd column gets renamed into "a1" (query "Workaround"):
>    SELECT "a", "b" AS "a" FROM (SELECT * FROM "Table" WHERE "Table"."a" <= 3)

Note that the renaming into "a1" is LibreOffice's work... If you
select "Run SQL command directly", then both columns are named "a".

> * do you agree with this example, or did I miss something ?

I don't understand the question.

> * is this behaviour common or is it proper to the embedded HSQL
>   database ?
> * is there really no way to access the "hidden" field ?

That the field is "hidden" in the WHERE clause is an HSQLDB
bug. Ignore it.

> * does this behaviour conform to some norm ?

See above, I explained the ISO SQL standard a bit.

> * should we accept aliases capturing field names ? or generate an error ? or work around ?

Not sure what the question is, or whether it becomes moot after my
explanations above. In Core ISO SQL, there is no alias capture problem
at the field level (but pay attention to *table* aliases).

About referring to query columns in the WHERE clause, we can support
that extension, but correctly. IMO, in increasing desirability
(later/lower in the list is better):

 - support the extension but fuckup disambiguation between table
   columns and query columns. (we are BUGGY)

 - do not support the extension (we do not have a feature we could
   have)

 - support the extension correctly:

   * a name that is ambiguous between a table column and a query
     column is ALWAYS the table column.

   * a name that is ambiguous between several table columns is AN
     ERROR. We don't necessarily have to detect that error, we can let
     the DBMS do that (unless we for some reason need to detect the
     error to not fuckup simplification or something like that).

 - As an optional extra feature added on top of "support the extension
   correctly":

   * however, when entering a criterion in the graphical query
     designer, in the following situation:

     Field:             name1
     Table:             Table1

     we could disambiguate [name1] to [Table1].[name1], even if
     [Table2].[name2] exists, because in the criterion of
     [Table1].[name1], the user probably meant [Table1].[name1],
     not [Table2].[name1]? Just an idea that floated in my head right
     now. AND OPTIONAL EXTENSION, we don't *have* to do that.

For table aliases, *never* allow / use / ... the "original" name of
the table, *only* the alias, which is the *only* acceptable name of
the table within the query!

-- 
Lionel