Select to view content in your preferred language

gdal.VectorTranslate() transforms non-ASCII characters?

53
2
10 hours ago
AlfredBaldenweck
MVP Regular Contributor

I'm trying a workflow using gdal.VectorTranslate(), since ogr2ogr isn't there anymore.

I'm having an issue of the original data using non-ASCII characters, but they are replaced by � when translated.

AlfredBaldenweck_0-1766169224937.png

AlfredBaldenweck_1-1766169237494.png

How can I make this not happen?

I tried setting the config options, but "OGR_FORCE_ASCII" is only used by certain drivers and processes. Similarly, I cannot find a Translate option that would appear to take care of this.

This is kind of a major thing. If I absolutely have to, I suppose I can get the strings from the original data, but that will majorly slow things down, not to mention complicate things. 
Thanks

 

 

Edit it appears that this is dependent on the source; I have no problems going from fGDB to fGDB, but the real data is in an MDB.

0 Kudos
2 Replies
AlfredBaldenweck
MVP Regular Contributor

It seems that pyodbc can read it just fine, which is super frustrating, since I can't figure out how to get it into a different format without having to download a new driver, which doesn't work if trying to distribute this workflow to various users (unless ?)

 

For context, this is what gdal is showing me for the strings

bytearray(b'S\xbdNE\xbc')

 I'm not sure how some things can read these correctly as fractions but not others? 

I was able to convert that string to hex, which I fed to a converter online and got the desired output, and then the converter immediately broke when I tried doing it again. 

53bd4e45bc

putting that string with the fractions in to the same converter gives me this

53 c2 bd 4e 45 c2 bc

 As you can see, I'm missing some stuff here.

Doing a bytes.fromhex().decode() fails because of an "invalid start byte".

Kind of out of ideas here, so if anyone has any I'd really appreciate it

0 Kudos
AlfredBaldenweck
MVP Regular Contributor

Okay, with the help of this post  I'm able to read it alright.

import codecs
inmdb = ogr.Open(in_ds)
sql = "Select TextString from LDAnno"
res = inmdb.ExecuteSQL(sql)
for r in res:
    t = r.GetFieldAsBinary(0).hex()
    t = codecs.decode(t, "hex").decode('dbcs')
    print(t)

 

 S½NE¼ W½SE¼ E½SW¼ E½SW¼ W½NE¼ W½NE¼ S½NW¼ N½SW¼

The question remains of how do I force gdal to use that reading instead of doing its own thing?

0 Kudos