Make 'sanitize_columns=True' an optional parameter on the Python API SEDF

JohnMDye

The Spatially Enabled Dataframe has a .to_featurelayer() method which allows users to easily publish a feature service from a Spatially Enabled Dataframe. It's a great, long desired capability with really only one shortcoming that I can find.

The SEDF's .to_featurelayer() method has the unfortunate behavior of sanitizing column names, even if the column names are valid to begin with.

The result is that its changes column names from their original case to snake_case. For example 'storeName' gets changed to 'store_name' when you utilize the .to_featurelayer() method to publish the data frame as a feature layer, even though there is absolutely nothing invalid at all about a feature layer with a field named 'storeName'.

Interestingly however, the .to_featureclass() method exposes a sanitize_columns parameter which is defaulted to True. Meaning you can set that parameter to False on the .to_featureclass() method to avoid this behavior, but for whatever reason this parameter wasn't exposed on the .to_featurelayer() method.

Please expose a 'sanitize_columns' parameter on the SEDF's .to_featurelayer() method so that we can disable this feature if we so desire. Like the SEDF's .to_featureclass method, the 'sanitize_columns' parameter can be defaulted to True to avoid an unexpected change in behavior for users who might already be using this.

Anonymous User · ‎10-21-2021

@JohnMDyethanks for this idea. The Python API team is considering this along with other ideas shared at https://community.esri.com/t5/arcgis-api-for-python-questions/i-m-done-with-spatially-enabled-datafr... and https://github.com/Esri/arcgis-python-api/issues/923.

In the upcoming v2.0 release, we are taking a broader look at IO operations on SeDF and plan to expose this at all appropriate operations. We also plan to switch the sanitizer to default to False. We also plan to make the sanitizer not mutate the original column names or indices of the calling DataFrame object. You can track the progress on https://github.com/Esri/arcgis-python-api/issues/923

ShaunWalbridge · ‎10-21-2021

HildermesJoséMedeirosFilho · ‎07-10-2022

It would be good to change the sanitize method to NFKD.

import unicodedata
import re
def sanitize(string: str) -> str:
    """
    Remove especial char, ie: "é" becomes "e"
    :param string: string to be sanitized
    :return: sanitized string
    """
    str_value = string
    if isinstance(str_value, str):
        nfkd_form = unicodedata.normalize('NFKD', str_value)
        str_value = u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
        str_value = re.sub(r'[^A-Za-z0-9]', '_', (str_value))
    return str_value

sanitize("colum a")
Out[14]: 'colum_a'

sanitize("@gps")
Out[15]: '_gps'

sanitize("résumé.a")
Out[16]: 'resume_a'

sanitize("Naïve")
Out[17]: 'Naive'

The current sanitize method is to aggressive in my opinion. I Hardly use it.