I'm trying a workflow using gdal.VectorTranslate(), since ogr2ogr isn't there anymore.
I'm having an issue of the original data using non-ASCII characters, but they are replaced by � when translated.
How can I make this not happen?
I tried setting the config options, but "OGR_FORCE_ASCII" is only used by certain drivers and processes. Similarly, I cannot find a Translate option that would appear to take care of this.
This is kind of a major thing. If I absolutely have to, I suppose I can get the strings from the original data, but that will majorly slow things down, not to mention complicate things.
Thanks
Edit it appears that this is dependent on the source; I have no problems going from fGDB to fGDB, but the real data is in an MDB.
It seems that pyodbc can read it just fine, which is super frustrating, since I can't figure out how to get it into a different format without having to download a new driver, which doesn't work if trying to distribute this workflow to various users (unless ?)
For context, this is what gdal is showing me for the strings
bytearray(b'S\xbdNE\xbc')
I'm not sure how some things can read these correctly as fractions but not others?
I was able to convert that string to hex, which I fed to a converter online and got the desired output, and then the converter immediately broke when I tried doing it again.
53bd4e45bc
putting that string with the fractions in to the same converter gives me this
53 c2 bd 4e 45 c2 bc
As you can see, I'm missing some stuff here.
Doing a bytes.fromhex().decode() fails because of an "invalid start byte".
Kind of out of ideas here, so if anyone has any I'd really appreciate it
Okay, with the help of this post I'm able to read it alright.
import codecs
inmdb = ogr.Open(in_ds)
sql = "Select TextString from LDAnno"
res = inmdb.ExecuteSQL(sql)
for r in res:
t = r.GetFieldAsBinary(0).hex()
t = codecs.decode(t, "hex").decode('dbcs')
print(t)
S½NE¼ W½SE¼ E½SW¼ E½SW¼ W½NE¼ W½NE¼ S½NW¼ N½SW¼
The question remains of how do I force gdal to use that reading instead of doing its own thing?
https://github.com/OSGeo/gdal/blob/9d2c301cb3e18d2fea3af32652d0a31de0447e10/apps/ogr2ogr_lib.cpp#L90
https://gdal.org/en/stable/doxygen/classCPLStringList.html
Seems like they're using a char array for strings? Could be that the source data isn't properly encoded, or is encoded as something that isn't utf-8 (latin1? cp1252?)
You're really having a lot of fun issues with encoding lately huh
It's ANSI, it appears.
I'm looking at finding all text fields, cycling through them, and then going through with an update cursor on the final product to update them to the correct values. Not 100% on how I'm going to do all that (for reasons I don't really want to get into I had to create a new unique ID field for each table during the Translate() process), but we're going to try. It'd be fine if they just brought over the values as-is and Pro couldn't read them, but they evaluate them for utf-8, freak out, and then change the values.