String with numbers are coerced to DT_Decimal even if the destination type is a string, on COPY QUERY
Does the set of data values contain a fixed number of decimal points? I would assume that it does, if it is being identified as decimal, but your example contains no decimal digits, and would be identified as an integer if none of the values had a leading zero, and as a string if any of the values had a leading zero.
A sensible modification to execsql would be to evaluate decimai-like values or floating point-like values as strings if they have a leading zero and the second digit is not a decimal point, similar to the way that integer-like values are now handled. I'll make that change.
The general answer to your question is no: there is no way to tell execsql to coerce data to the data types of the destination table. This is because databases support many more data types than Python does. Execsql is designed to handle common data types, but more esoteric types (e.g., JSON, tsvector) are left as strings and converting them to the correct type needs to be done either by the DBMS or by custom code, by importing the data first into a staging table and then casting it to the proper data type when moving it to a base table.
One reasonable configuration option that might be useful is one that allows data type identification to be turned off entirely, so that everything is imported or copied as strings.
There was indeed a bug wherein integers with leading zeroes were leaking through to be evaluated as decimals. That has been fixed, and the latest version (1.81.0) also has a new configuration option ("only_strings") that suppresses all data type evaluation except for character data types.
Reply To rdnielsen
There was indeed a bug wherein integers with leading zeroes were leaking through to be evaluated as decimals. That has been fixed, and the latest version (1.81.0) also has a new configuration option ("only_strings") that suppresses all data type evaluation except for character data types.
Thank you very much! It seems to report the proper data type now :)
I've been chasing an odd bug recently, while attempting to copy data between two databases using COPY QUERY:
The above behaviour seems to be due to the way Column class works: it seems to evaluate the types as defined by the order in self.accums, going from the most specific to the least specific, meaning that a number-like thing will always be recognized as a number, even though it is intended to be a string.
Is there a way to tell execsql to use a specific schema (to be defined in the configuration or the script itself) or to respect the data types of the destination table, when doing COPY/COPY QUERY?