Make 'sanitize_columns=True' an optional parameter on the Python API SEDF

1714
3
06-15-2021 07:54 AM
JohnMDye
Occasional Contributor III

The Spatially Enabled Dataframe has a .to_featurelayer() method which allows users to easily publish a feature service from a Spatially Enabled Dataframe. It's a great, long desired capability with really only one shortcoming that I can find.

The SEDF's .to_featurelayer() method has the unfortunate behavior of sanitizing column names, even if the column names are valid to begin with.

The result is that its changes column names from their original case to snake_case. For example 'storeName' gets changed to 'store_name' when you utilize the .to_featurelayer() method to publish the data frame as a feature layer, even though there is absolutely nothing invalid at all about a feature layer with a field named 'storeName'.

Interestingly however, the .to_featureclass() method exposes a sanitize_columns parameter which is defaulted to True. Meaning you can set that parameter to False on the .to_featureclass() method to avoid this behavior, but for whatever reason this parameter wasn't exposed on the .to_featurelayer() method.

Please expose a 'sanitize_columns' parameter on the SEDF's .to_featurelayer() method so that we can disable this feature if we so desire. Like the SEDF's .to_featureclass method, the 'sanitize_columns' parameter can be defaulted to True to avoid an unexpected change in behavior for users who might already be using this.

3 Comments
by Anonymous User

@JohnMDyethanks for this idea. The Python API team is considering this along with other ideas shared at https://community.esri.com/t5/arcgis-api-for-python-questions/i-m-done-with-spatially-enabled-datafr... and https://github.com/Esri/arcgis-python-api/issues/923.

In the upcoming v2.0 release, we are taking a broader look at IO operations on SeDF and plan to expose this at all appropriate operations. We also plan to switch the sanitizer to default to False. We also plan to make the sanitizer not mutate the original column names or indices of the calling DataFrame object. You can track the progress on https://github.com/Esri/arcgis-python-api/issues/923

ShaunWalbridge
Status changed to: Under Consideration
 
HildermesJoséMedeirosFilho

It would be good to change the sanitize method to NFKD.

import unicodedata
import re
def
sanitize(string: str) -> str:
"""
Remove especial char, ie: "é" becomes "e"
:param string: string to be sanitized
:return: sanitized string
"""
str_value = string
if isinstance(str_value, str):
nfkd_form = unicodedata.normalize('NFKD', str_value)
str_value = u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
str_value = re.sub(r'[^A-Za-z0-9]', '_', (str_value))
return str_value 

 

sanitize("colum a")
Out[14]: 'colum_a'

sanitize("@gps")
Out[15]: '_gps'

sanitize("résumé.a")
Out[16]: 'resume_a'

sanitize("Naïve")
Out[17]: 'Naive'

The current sanitize method is to aggressive in my opinion. I Hardly use it.