idea Make 'sanitize_columns=True' an optional parameter on the Python API SEDF in ArcGIS API for Python Ideas

Make 'sanitize_columns=True' an optional parameter on the Python API SEDF

JohnMDye — Tue, 15 Jun 2021 14:54:45 GMT

The Spatially Enabled Dataframe has a .to_featurelayer() method which allows users to easily publish a feature service from a Spatially Enabled Dataframe. It's a great, long desired capability with really only one shortcoming that I can find.

The SEDF's .to_featurelayer() method has the unfortunate behavior of sanitizing column names, even if the column names are valid to begin with.

The result is that its changes column names from their original case to snake_case. For example 'storeName' gets changed to 'store_name' when you utilize the .to_featurelayer() method to publish the data frame as a feature layer, even though there is absolutely nothing invalid at all about a feature layer with a field named 'storeName'.

Interestingly however, the .to_featureclass() method exposes a sanitize_columns parameter which is defaulted to True. Meaning you can set that parameter to False on the .to_featureclass() method to avoid this behavior, but for whatever reason this parameter wasn't exposed on the .to_featurelayer() method.

Please expose a 'sanitize_columns' parameter on the SEDF's .to_featurelayer() method so that we can disable this feature if we so desire. Like the SEDF's .to_featureclass method, the 'sanitize_columns' parameter can be defaulted to True to avoid an unexpected change in behavior for users who might already be using this.

Re: Make 'sanitize_columns=True' an optional parameter on the Python API SEDF

Anonymous User — Thu, 21 Oct 2021 20:00:33 GMT

@JohnMDyethanks for this idea. The Python API team is considering this along with other ideas shared at https://community.esri.com/t5/arcgis-api-for-python-questions/i-m-done-with-spatially-enabled-dataframes/m-p/1026149 and https://github.com/Esri/arcgis-python-api/issues/923.

In the upcoming v2.0 release, we are taking a broader look at IO operations on SeDF and plan to expose this at all appropriate operations. We also plan to switch the sanitizer to default to False. We also plan to make the sanitizer not mutate the original column names or indices of the calling DataFrame object. You can track the progress on https://github.com/Esri/arcgis-python-api/issues/923

Re: Make 'sanitize_columns=True' an optional parameter on the Python API SEDF - Status changed to: Under Consideration

ShaunWalbridge — Thu, 21 Oct 2021 20:20:29 GMT

Re: Make 'sanitize_columns=True' an optional parameter on the Python API SEDF

HildermesJoséMedeirosFilho — Sun, 10 Jul 2022 15:42:50 GMT

It would be good to change the sanitize method to NFKD.

import unicodedata
import re
def sanitize(string: str) -> str:
    """
    Remove especial char, ie: "é" becomes "e"
    :param string: string to be sanitized
    :return: sanitized string
    """
    str_value = string
    if isinstance(str_value, str):
        nfkd_form = unicodedata.normalize('NFKD', str_value)
        str_value = u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
        str_value = re.sub(r'[^A-Za-z0-9]', '_', (str_value))
    return str_value

sanitize("colum a")
Out[14]: 'colum_a'

sanitize("@gps")
Out[15]: '_gps'

sanitize("résumé.a")
Out[16]: 'resume_a'

sanitize("Naïve")
Out[17]: 'Naive'

The current sanitize method is to aggressive in my opinion. I Hardly use it.