Utype syntax and character set
- Data base manager may prefer utypes in lower-case when stored in a database.
- How is it compatible with application using together Utypes as defined in previous data models (SSA, Characterisation DM) this new proposed format. we need to query , but also to 'read' and interpret the query response and the Utypes coming back.( ML)
M.Louys 2011/02/11
Dear all,
In a recent telecon for Obstap draft editing, we chose to shorten top elements.
Observation. --> Obs.
Characterisation --> Char.
We consider Curation , DataID, Access, etc. as top elements as in the Spectrum DM.
What I tried to introduce in the last version of the Utype Note was to allow automatic generation from UML to Utype list and xml schema.
This requires that we adopt a few rules for writing Utype strings:
- use lower case for first letter of attributes names
Examples
* Obs.calibLevel
* Obs.dataproductType
- use the name of the role when a class has a reference or an association to some other class. Code this role in lowercase.
This is not always easy to read , so this is why the CamelCase writing was used in most DM text.
examples: Char.spatialAxis.coverage.bounds.extent
I think we can adopt the fact that a machine , a parser always sees the Utypes in lower case , and that we use CamelCase in standards definition only in a parcimonious way for new models like ObsTAP , and as defined in existing models like Spectrum and SSA.
I am currently editing a new version of the tables in the draft to follow these lines and give an idea of the simple utype shape proposed.
Could we agree on a simple strategy like that? Other suggestions?
Mireille
P.Dowler 2011/02/11
>
Could we agree on a simple strategy like that? Other suggestions?
Working from the end back to the source, the ObsCore document should give the
utypes exactly as they should be set in the tap_schema. As values stored in a
RDBMS, we have to be aware that implementors have to actually insert the
values there and make sure that some value ends up in the VOTable output, so
there is no sense giving them more things to think about: we should be
explicit about the exact value.
In my opinion, it is possible to define simple rules for going from UML to
utypes and we should follow those rules. If there is value in making utypes
readable, then the rules should produce readable utypes everywhere (eg. that
may be a use case for the utype spec).
If, in the utypes spec, utypes are considered to be case-insensitive, we
should be aware that such a choice will force people to add extra code
(utype.equalsIgnoreCase(str) instead of utype.equals(str) in java,
LOWER(tap_schema.columns.utype) = 'abc.def' in
ADQL). And it will always
introduce extra decision making and confusion when people have to setup their
services.
my 2c,
D.Tody to Mireille 2011/02/21
Hi -
I think this looks reasonable; it is nice to have a well defined and
consistent scheme. The "role" business is pretty subtle however; we
should be able to do this consistently in our specifications and
reference implementations, but it is likely that users will often miss
these subtleties and get the case wrong.
Re Utypes being case-insensitive: I continue to think this is needed,
although as Pat notes it will complicate the code slightly. But it will
also improve reliability at the cost of an extra line of code (or merely
a function call) here and there.
The thing about Utypes is that this is the mechanism we use to refer to
data model elements all the way up in science code, with layer after
layer of software, the Internet, etc. between the application and the
eventual remote data service or DBMS. Aside from the user/scientist or
applications programmer being careless about case, there are many
opportunities for errors or translation issues. If this were all
controlled system software we might be ok to have it be case sensitive,
but given the amount of uncontrolled software in the path I think it
would be foolish to not plan on having the occasional toLowerCase() or
whatever in there for string comparisions when looking for Utypes. So
yes, Utypes should be defined as case insensitive, but we should use
case to improve readability.
- Doug
A.Micol to Doug 2011/02/22
The theory:
About string values stored in a database, the principle should be to
always use the same case, whether lowercase or uppercase.
That way it is clear to the user what to use when formulating a query,
and it is always possible for the DBMS to optimse the query by using
the relevant indeces. The (performant) query will look like this:
SELECT * FROM mytable
WHERE mystring = 'all my mystring values are lower-case.'
If instead a mix-case is adopted, and different people use different rules
(UpperCamelCase vs lowerCamelCase), the only way to make a query working in
all cases is to use a function, like in Pat's example:
SELECT * FROM myTable
WHERE LOWER(mystring) = 'mystring values use some kind of camel-case.'
which does not allow the DBMS to use the index on the column utype. In that
case a table scan is performed by the DBMS, introducing an unnecessary
performance penalty.
The practice:
In the typical realistic case the only queries involving utypes will be
"discovery" queries trying to see which table contains which field
for a given utype.
That is achieved by querying the TAP_SCHEMA.columns table.
Such table is very likely a short table with a number of records
(= number of all columns in all tables served through a given TAP service)
that is not going to exceed by much a couple of thousands entries in the
typical case. With so few records there is no point in creating an index,
a table scan suffices, with no performance penalty, and the query:
SELECT * FROM TAP_SCHEMA.columns
WHERE LOWER(utype) = 'char.spatialaxis.coverage.resolution.resolutionrefval'
is as good.
Therefore in this particular, and typical, case it does not really matter
if utypes are stored as lower-case or not.
But there could be other cases!:
The real question is: are there other cases where querying by free-case utypes
could affect performance?
If that is the case, and we need to query many more (>1E5 or 1E6) records,
then I think there is no option available: there will be a compelling
reason to force lower-case utypes into our databases.
Counter examples?:
VIZIER:
Probably even the case of Vizier is not compelling enough:
Suppose VIZIER hosts 10,000 catalogues, with -say- 50 columns each:
that would total a number of 500,000 records in TAP_SCHEMA.columns.
I think that is not a huge number of records to scan through with modern
hardware, even without indeces.
"event" tables:
A more compelling (though probably rare) scenario could be a kind of
event table (sorry, a better name must exist for what I have in mind)
of the kind here illustrated:
Consider a table that contains all header keywords
of all (single extension FITS for the sake of simplicity) files
in a given archive:
the "id" uniquely identifies a file in the archive,
the "name" uniquely identifies a keyword in the file header
the "value" is the value of that keyword for that "id"
the "utype" is the utype associated to that keyword
In this case, the table could be considerably long, and contain millions
of records.
If the utype is not forcefully single (eg lower) case, then the
performance penalty, when querying, could be huge.
Conclusions:
I'm not stating that the "event" table above is the best system, but
it is
a system that data providers might be using.
What I'm trying to say is that:
* not forcing a single case for utypes in a database *
* might have an impact on data providers' architecture *
and that is not nice.
Therefore,
I vote for LOWER-CASE
UTYPES WHEN STORING THEM IN A DATABASE,
throughout the entire VO.
That is, to me this is a necessary requirement for healthy performance
of the VO in general, and should be a requirement expressed in the
UTYPE standard itself, something like:
if someone has to ingest a VOTable into a DBMS,
at INSERT time, s/he will have to always use the LOWER function
on the utype attribute.
Alberto
PS: the client could always render the output of a query in a more readable way.
P.Dowler 2011/02/22
>
Therefore,
>
> I vote for LOWER-CASE UTYPES WHEN STORING THEM IN A DATABASE,
>
> throughout the entire VO.
I also vote for this option; it means one more thing is explicit and one less
thing that implemetors (and users) have to think about and decide.
D.Tody to Pat and Alberto 2011/02/22
I don't think we should let the limitations of DBMS technology force
us to use monocase for all VO metadata (which is where this appears to
be headed). It reminds me of the old teletype days WHERE EVERYTHING
WAS UPPER CASE.
Utypes should be case-insensitive for the reasons noted in my
earlier email, as well as to conform to existing documentation and
implementions, none of which have had any problem whatsoever with
Utypes being case insensitive. Also just to be more user friendly
and readable.
It seems to me this is mainly just a technical issue of how to
perform an efficient string search/comparison in the DBMS. It may
well be DBMS-specific. I just did a quick check of MySQL and Oracle
for example, and there is quite a bit online re case sensitive or
insensitive comparison with these two DBMSes. I have not looked into
it further, but it is clear that this is a fairly common issue.
Another issue is how common it will really be to search archives
by UType directly in
ADQL queries. We may find it more common for
example to read in the TAP_SCHEMA and compose queries on the client
side, using table column names. Or perhaps there is a translation
layer on the server side which comes into play when we start doing
higher level data model manipulation or queries.
Note this is not really an issue for ObsTAP, since we are already
using fixed (lower case) column names in this use case.
- Doug
D.Tody 2011/02/22
Here is one example:
http://venutip.com/content/case-sensitivity-mysql-do-you-really-need-it
Case sensitivity in MySQL (do you really need it?)
Submitted by venutip on July 24th, 2008
Not sure how this escaped my notice until now, but today
I realized the default collation for MySQL columns is not
case-sensitive. Among other things, this means that if you're
searching for distinct values on a column with varchar, text,
or any other non-binary data type, values that differ only in
their cases (such as "apple" and "APpLE") are considered equal,
and you won't get both values back. MySQL will return "apple" or
"APpLE", but not both.
So in this case the collation determines whether string data is
case sensitive or not, and evidently an index can be generated on
a character column with a case insenstive collation.
- Doug
F.Bonnarel to group of editors 2011/02/28
Hi All,
Sorry if I am putting some divergence in the discussion here but I will vote
AGAINST utypes being forced lower case into DBMS.
If it is to be usefull for interoperability ACROSS services utype definition should
allow some rule of construction and I think the one in Mireille's note seem to be the
best. If it is consistent with UML it will be consistent with xml serialization and
Java or C++ code generation where classes and Methods are generally using camelCase.
For those of developpers and users not aware or oblivious of these rules, published
utypes tables could be an absolute reference and this can be installed case sensitivitly in the
column description of the TAP schema without any difficulty. So for sure they will appear in the
Query response VOTABLE and this seems very important to make them understandable.
For future queries by utypes (
ADQL or
PQL) -which is not the intention of the current
ObsTAP draft, as we query only by column names - but is surely critical for IVOA-wide interoperability
in the future I am still not sure if we have to allow case insentiveness and add the piece of code Doug is
proposing or force the query to be written with CamelCase utypes. But this can still
be discussed after the draft release as long as we correctly fill the TAP_SCHEMA.
Cheers
François
A.Micol to François 2011/02/28
Hi Francois,
While some RDBMes (as Doug mentioned) are natively case-insensitive (e.g. Postgres, MySQL), others are not (e.g. Sybase).
Also, XPath statements are case sensitive.
It is always possible to get around this using some DBMS function (e.g. to lower-case) prior the comparison.
For example:
lower(utype) = "char.bla.bla.bla.location"
or
/bookstore/book[translate(
@category, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = '" + searchtext.ToLower() + "']
The fact is that such conversion will be needed by all services, clients, servers, and databases (e.g. at ingestion time)
to make sure that the proper match is found.
That is certainly doable, but it is:
(1) not performant (cannot use indeces in RDBMSes),
(2) prone to error,
(3) all software will need to add utype-case management, making it a bit more complicated than necessary.
Even checking if the utype is spelt with the correct case is something that will be required.
The motto is always: Keep it simple.
I do not see the reason to deviate from simplicity, if nothing else is gained other than presumed
readibility for developers that need to read a utype just for the time they invest in developing a tool.
Alberto
PS: I note that this is the first DAL service which does not utilise UCDs. I personally think
that neither UTYPEs are that relevant for this exercise (but I reserve to make my case at a later time).
D.Tody to Alberto 2011/02/28
>
The fact is that such conversion will be needed by all services, clients, servers, and databases (e.g. at ingestion time)
>
to make sure that the proper match is found.
Right - for Utype matching to work reliably given the many layers of
softtware and languages involved (and Utypes are NOT just an ObsTAP
issue), case insensitive comparison will be needed. Consider the
alternative. Instead we have case sensititive Utypes; we query to see
if data supports a particular Utype, but do not find it due to a case
issue, even though the Utype is there and it is obvious to a human
looking at the data that we have a match. An important bit of data
model is ignored in the analysis and possibly results in subtle analysis
errors. This is unacceptable merely because it is inconvenient to do
a case-insensitive match in some cases of software implementation.
>
That is certainly doable, but it is:
>
(1) not performant (cannot use indeces in RDBMSes),
This is not true Alberto. In the cases I mentioned, where case
insensitive character data is supported, the data can still be indexed.
If the DBMS does not support this, you can do it in your ObsTAP
implementation (not that we even search by Utype of course), by storing
data internally in lower case and forcing the externally-supplied Utype
value to lower case when you translate the
ADQL string. For a
PQL query
it will be no problem of course since the conversion can easily be
applied internally.
>
(2) prone to error,
NOT doing a case-insensitive compare to look for Utype tags is what is
prone to error (see above).
>
(3) all software will need to add utype-case management, making it a bit more complicated than necessary.
If the DBMS or other technology supports case-sensitive comparisons (and
virtually all do) then little or no extra code is required, e.g., it is
supported natively by the DBMS.
>
The motto is always: Keep it simple.
>
I do not see the reason to deviate from simplicity, if nothing else is gained other than presumed
>
readibility for developers that need to read a utype just for the time they invest in developing a tool.
So basically this is generalizing into an argument that all string
metadata in VO which is subject to comparision be lower case, even all
the way up into user interfaces. It won't end with Utype of course.
And you are casually suggesting that we invalidate several years of
practice and standarization within VO, where Utypes have long been
considered to be case-insensitive.
- Doug
P.Dowler to Alberto 2011/02/28
>
Hi Francois,
>
>
>
> While some RDBMes (as Doug mentioned) are natively case-insensitive
>
> (e.g. Postgres, MySQL), others are not (e.g. Sybase).
It is simply not true that postgres and MySQL are case-insensitive. Doug
pointed out that one can set the collation in MySQL to do that (and the
document he quoted claimed this was the default behaviour), but it is not true
in postgresql (by default).
In several servers I did look into this (back during the
ADQL spec days) it is
usually a global setting that effects all queries in all databases so
implementors will generally not be free to change such a setting. Thus, it is
much better to assume that string comparisons in TAP are case-sensitive:
because
ADQL says they are and because they are in most RDBMSs.
That is not saying that utypes are case-sensitive or not. I'm just talking
about the underlying tools. It is up to the utype spec to define the rules for
utypes and then up to other systems to adapt; TAP is only one of these other
systems.
D.Tody to Pat 2011/02/28
>
On 2011-02-28 07:45:24 Alberto Micol wrote:
>
> Hi Francois,
>
>
>
> While some RDBMes (as Doug mentioned) are natively case-insensitive
>
> (e.g. Postgres, MySQL), others are not (e.g. Sybase).
>
>
It is simply not true that postgres and MySQL are case-insensitive. Doug
>
pointed out that one can set the collation in MySQL to do that (and the
>
document he quoted claimed this was the default behaviour), but it is not true
>
in postgresql (by default).
See PostgreSQL CITEXT data type:
http://www.postgresql.org/docs/8.4/interactive/citext.html
>
In several servers I did look into this (back during the ADQL spec days) it is
>
usually a global setting that effects all queries in all databases so
>
implementors will generally not be free to change such a setting. Thus, it is
>
much better to assume that string comparisons in TAP are case-sensitive:
>
because ADQL says they are and because they are in most RDBMSs.
In the cases I looked into, this could be set individually for a
table field.
>
That is not saying that utypes are case-sensitive or not. I'm just talking
>
about the underlying tools. It is up to the utype spec to define the rules for
>
utypes and then up to other systems to adapt; TAP is only one of these other
>
systems.
I agree that the issue of Utype being case-insensitive should be
separated from implementation issues, e.g., how to manipulate Utypes in
TAP. However although it has not been openly discussed, it is clear
that storing Utypes in all lower case in a DBMS and then presenting them
to a user and to client applications thusly in a TAP_SCHEMA query will
defacto try to force them to be monocase so that case-sensitive
comparisions can be used.
As I noted in earlier mail, we have a serious reliability issue if we
try to trust to case sensitive comparisions when looking up data model
elements. There is no way to distinguish "not found" from a case
confusion issue. There is no question that it is safer to ensure that
case insensitive comparision is used.
- Doug
D.Tody to editors 2011/03/02
I think we ought to start over with a clear statement of the issue and
what is being suggested. That is, VO standards and current documents
and practice have stated for years that Utypes are case-insensitive.
However it has been suggested that Utypes be "stored in a DBMS in lower
case" to facilitate comparision.
What is really being suggested here? Is this merely an implementation
issue regarding how we manipulate Utypes in TAP, or are people
suggesting that we reverse current standards and declare Utypes to be
case sensitive and specified in all lower case? Will this spread to
other forms of text which we may wish to search or compare? Title
strings? Proposal cover pages? Do we replace all camel case text usage
with underscore? Where will this end up if we go down this path?
If this were merely an implementation issue then how we store Utypes in
a DBMS is up to the TAP implementation. Most DBMSes apparently support
case-insenstive comparision so this could be handled entirely within the
implementation (differently in each case). For others it might be
necessary to store utype text in lower case and convert string values on
input and output to preserve case while doing an explicit case sensitive
compare internally. Again, this would be an internal implementation
issue.
Note that while mixed case text comparision technology may be in a
primitive state in DBMSes, it is quite advanced generally, e.g., Google,
Apache Solr, and so forth. Ultimately we will need to use these
technologies for text and keyword searching; I really don't think we
will end up forcing all text to lower case. If so then we eventually
have to bite the bullet anyway, and we should define what more limited
problem we are trying to solve here.
If it is not merely an implementation issue then it could have global
consequences as noted above. I suggest we need to clarify what the
actual proposal is here.
- Doug
D.Tody to editors 2011/03/02
I do not think we have adequately defined what
is being proposed. We should back up and clarify what the actual
proposal is and not get lost in the technical details.
BTW, here is a simple solution to doing case-insensitive search on Sybase:
http://stackoverflow.com/questions/81268/case-insensitive-search-on-sybase
As Alberto notes Sybase does not directly support CI-search (except
evidently at a global per-database level), however it could be easily
implemented for a specific column with the approaches shown. Again,
this appears to be merely an implementation issue.
Utype however is not required to do ObsTAP
ADQL queries so all such a
service actually needs to do is store and return the column metadata,
so it is not clear why this is even an issue.
- Doug
A.Micol to Doug 2011/03/03
>
>
BTW, here is a simple solution to doing case-insensitive search on Sybase:
>
http://stackoverflow.com/questions/81268/case-insensitive-search-on-sybase
>
That is a quite obvious solution to the problem of not forcing lowercase utypes.
Sure, solutions can be found, but the question is: why?
I mean, eradicating the problem is not better than having to solve it?
A simple (lowercase) choice removes the problem.
Technically, the solution that Doug's pointed us to
requires all data providers to implement 2 columns,
- one for the camelCase utypes to be used in the SELECT part,
- one for the lowercase utypes to be used in the WHERE clause.
This means doubling the size required to host the already long utypes,
which could be an issue for already packed tables (packed in the sense
of being close to the row size limit imposed by the DBMS page size).
See e.g.
http://dbaspot.com/forums/sybase/411958-warning-row-size-2015-bytes-could-exceed-row-size-limit-1962-bytes.html
which shows that the row size limit on a sybase DBMS set to use a 2K page size
is only 1962 bytes.
>
As Alberto notes Sybase does not directly support CI-search (except
>
evidently at a global per-database level), however it could be easily
>
implemented for a specific column with the approaches shown. Again,
>
this appears to be merely an implementation issue.
>
My point is that I do consider the VO "merely an implementation issue"!
IVOA is not inventing new magic algorithms that perform science on behave
of the astronomers; IVOA wants to provide a working infrastructure to facilitate
astronomers' lives. Isn't that just purely an implementation issue?
Making any implementation more complex than needed is what has stopped
implementors so far. Just ask yourself the question: how many important
data providers have not yet published their data in the VO? I do not need
to give names here, you know them all.
Hence: Keep it simple!
That is the recipe.
And it should not be seen as a problem the fact that an overly-simple
DAL protocol does not cover 100% of the data (or maybe not even 90%).
So far astronomers can find data through the VO for much less than 90%
of the important data providers (not to speak of the small ones).
Let's help data providers to publish in the VO: simplicity is key!
Alberto
D.Tody to Alberto 2011/03/03
Hi Alberto -
Thanks at least for the nice response. I think I got part of my
answer to clarifying this proposal - you guys really do want to
overturn 5 years of VO practice and standards and declare Utype to be
case-sensitive and monocase. It still isn't clear if this proposal
would then extend to any other text we want to compare or search,
whether doing away with camelcase would mean adding underscores
for readability, whether we would have to reissue all the older VO
standards, etc. If you want to seriously propose such radical change
the idea would need to be fleshed out to see what the full impact
would be if implemented uniformly.
In the meantime here is a simpler proposed solution:
1) Add a CITEXT type to
ADQL in a future version (as in PostgreSQL).
2) Ignore this issue for ObsTAP 1.0 since it does not provide any
means nor have any requirement to search Utypes in any case.
It is quite handy to have the ability to do case-insensitive searches
in a DBMS, and so far we have found this to be fairly straightforward
to do for every DBMS examined. The approach does vary, however hiding
these differences in SQL implemenations is exactly what
ADQL is for.
There would be no need to clutter up every ObsTAP implementation
as you note below. Utype text will compress extremely well (being
very repetitive) hence so long as the DBMS supports record and index
compression there will be little impact on record size. In any case
for most tables this mainly affects the TAP_SCHEMA, not the actual
data records, and the schema definition is not large.
>
And it should not be seen as a problem the fact that an overly-simple
>
DAL protocol does not cover 100% of the data (or maybe not even 90%).
>
So far astronomers can find data through the VO for much less than 90%
>
of the important data providers (not to speak of the small ones).
The global multiwavelength search, while essential for VO, is only one
use case (although the basis of all the ObsTAP use-cases as you noted
on the TWiki). Using the VO framework to build VO-enabled science
archives is another; this is where the extensibility requirement
comes in. This latter use case is very interesting as well, as it
would get VO technology directly in the workflow for the average
astronomer getting data from various archives and data centers, while
integrating this type of usage directly with VO, providing a seamless
path from routine archive data access to VO-enabled analysis, since
both would be based upon the same core technology. Otherise VO is
just an add-on and might not get used at all, since there are other
and often better archive-specific ways to do all these things.
- Doug
M.Louys to editors 2011/03/03
Dear all ,
My point of view is that it is difficult to decide whose life we want to get simpler:
- The one from the user?
- The one from people building interoperable services ( web oriented) that need to use valuable tags to check their information ?
- The one from data base manager that head for high performance searches ?
There is definitely a compromise to find and it is beyond the scope of ObsTAP, more linked to the definition of Utypes.
...
T.Boch - 2011-03-03T21:08:17
Hi François,
I don't really get the point of the discussion about case (in)sensitivity of utypes. Why can't we just state that :
"UTypes are case insensitive. 'abc.def' is the same as 'ABC.DEF' which is the same as 'aBc.DEf', etc"
For efficiency, service implementors would store utypes in lower case in their DMBS, and convert the user utype query to lower case before performing the comparison.
As an application developer used to interact with services implementing and interpreting standards in "original" ways, I would write defensive code and force the string comparison to be case insensitive whatever the standard says.
Cheers,
Thomas
A.Micol 2011-03-04T10:28:00
The proposal by Doug to add a CITEXT type to
ADQL in a future version (as in PostgreSQL) wouldn't work. That is a setting at the server level, over which a developer does not have control in a big organisation like ESO.
I agree with Thomas: lowercase utypes in the DBMS!
Though, I want to highlight the fact that in such case
the VOTable response will always contain lowercase utypes.
--
JuanDeDiosSantanderVela - 07 Mar 2011, 14:44 UTC
Dear all,
if UTypes are to be compared case-insensitively, and that is how UTypes where defined, I consider that should not go into the ObsTAP document. If the way a given services does that is by storing lowercase UTypes in the DB, and converting the whole
ADQL expression into lowercase before evaluating, so be it, but I do not think this has to be mandated by ObsTAP.
--
RandyThompson - 08 Mar 2011
I agree with Thomas. I would say
UTYPES shall be treated as case-insensitive strings, favoring lower-case for efficiency. We routinely convert all user-input to lower-case before formulating a database query.
--
FrancoisBonnarel - 15 mar 2011
I follow all people telling that the query must be case insensitive. But if you store the utypes in small letters in the TAP schema what about the query RESPONSE ? Wherefrom does the system know the correct syntax for further interoperability ?
--
PatrickDowler -- 2011-03-23
I want to withdraw my previous position on requiring lower case utypes in the database. Since utypes are not case-sensitive throughout the VO, that approach will just encourage people to rely on a local convention rather than write correct code (apps, queries, etc). Also, TAP does not mandate lower-case utypes in general and TAP services with ObsCore table will undoubtedly have other tables.
We have many places where the underlying DB does not support exactly what the user requests via TAP+ADQL and it is the job of the service implementor to make these things work and hide their own implementation details. The idea from Doug to add something like CITEXT to a future
ADQL is a good one, but more importantly it is going in the right direction: add a feature to TAP+ADQL so it works with the real-world usage and provide information to the user to help them write the best query.
There are several ways to implement CI search and good TAP services will do what they need to do to make it work (CITEXT type, index on function return value, redundant columns, etc). We should note to users that since utypes are not case-sensitive they should use LOWER(someUtypeColumn) in the where clause when someUtypeColumn is of type adql:VARCHAR. If a specific service has a smallish TAP_SCHEMA, they don't have to do anything. If anyone uses utypes specifically in columns where there are many rows, then they probably want to optimise that to provide a good service, but that is the job of the implementor.
To make the document clear, I would use readable utypes (camelCase). I personally will make sure that my services output readable utypes. In the sample SQL that inserts the correct TAP_SCHEMA content I would just use the camelCase values.