Incidencia #40935

String with numbers are coerced to DT_Decimal even if the destination type is a string, on COPY QUERY

Abrir Fecha: 2020-11-08 20:09 Última actualización: 2020-11-09 23:03

Informador:
(Anónimo)
Propietario:
(Ninguno)
Tipo:
Estado:
Cerrado
Componente:
(Ninguno)
Hito:
(Ninguno)
Prioridad:
6
Gravedad:
5 - Medium
Resolución:
Fixed
Fichero:
Ninguno

Details

I've been chasing an odd bug recently, while attempting to copy data between two databases using COPY QUERY:

  • My data source has a field, 'test_1', which is a string representing a number-like value (e.g. a VAT or a NIN or a ZIP code).
  • The destination table has a a field to hold that value, which is of the correct string type (i.e. VARCHAR).
  • The copy command seems to coerce the value in the source field to DT_Decimal first, then back to string. This causes the value to be evaluated as a number, which is not intended, before being converted again to string. For example, in the case of a VAT number having leading zeros, e.g. 0123456, it causes the data in the destination field to be the string "123456" (note the missing trailing 0).

The above behaviour seems to be due to the way Column class works: it seems to evaluate the types as defined by the order in self.accums, going from the most specific to the least specific, meaning that a number-like thing will always be recognized as a number, even though it is intended to be a string.

Is there a way to tell execsql to use a specific schema (to be defined in the configuration or the script itself) or to respect the data types of the destination table, when doing COPY/COPY QUERY?

Ticket History (3/5 Histories)

2020-11-08 20:09 Updated by: None
  • New Ticket "String with numbers are coerced to DT_Decimal even if the destination type is a string, on COPY QUERY" created
2020-11-09 00:09 Updated by: rdnielsen
Comentario

Does the set of data values contain a fixed number of decimal points? I would assume that it does, if it is being identified as decimal, but your example contains no decimal digits, and would be identified as an integer if none of the values had a leading zero, and as a string if any of the values had a leading zero.

A sensible modification to execsql would be to evaluate decimai-like values or floating point-like values as strings if they have a leading zero and the second digit is not a decimal point, similar to the way that integer-like values are now handled. I'll make that change.

The general answer to your question is no: there is no way to tell execsql to coerce data to the data types of the destination table. This is because databases support many more data types than Python does. Execsql is designed to handle common data types, but more esoteric types (e.g., JSON, tsvector) are left as strings and converting them to the correct type needs to be done either by the DBMS or by custom code, by importing the data first into a staging table and then casting it to the proper data type when moving it to a base table.

One reasonable configuration option that might be useful is one that allows data type identification to be turned off entirely, so that everything is imported or copied as strings.

2020-11-09 05:08 Updated by: rdnielsen
Comentario

There was indeed a bug wherein integers with leading zeroes were leaking through to be evaluated as decimals. That has been fixed, and the latest version (1.81.0) also has a new configuration option ("only_strings") that suppresses all data type evaluation except for character data types.

2020-11-09 17:24 Updated by: None
Comentario

Reply To rdnielsen

There was indeed a bug wherein integers with leading zeroes were leaking through to be evaluated as decimals. That has been fixed, and the latest version (1.81.0) also has a new configuration option ("only_strings") that suppresses all data type evaluation except for character data types.

Thank you very much! It seems to report the proper data type now :)

2020-11-09 23:03 Updated by: rdnielsen
  • Resolución Update from Ninguno to Fixed
  • Estado Update from Open to Cerrado

Attachment File List

No attachments

Editar

You are not logged in. I you are not logged in, your comment will be treated as an anonymous post. » Entrar