ArcGIS Pro Python Toolboxes: Patterns for Distribution

a-j-campbell · ‎09-23-2024

Hi all!

I am working with an Esri partner to modernize an existing python toolbox in preparation for commercialization on their behalf and we've encountered a couple of questions I have yet to find an answer to that I believe will apply to a broad swath of users.

At this stage, we're focusing on the encryption and packaging of source-code. First, I'm going to outline the work already done on encryption, but will primarily have questions on packaging as I think it will influence the execution of the former anyways.

Encryption

Due to the size of our code base, we've went ahead and separated out most of the core logic into separate modules to then be brought into the toolbox via import in the .PYT. Through some experimentation I've been able to test the existing "Right-click --> Encrypt.." workflow as well as applying EncryptPYT across our modularized repo structure and have found that neither option in my current implementation covers our ideal scenario.

When running encrypt against solely the Python Toolbox (.PYT), our source code is masked via the UI, but depending on packaging would be available for a savvy and/or curious user to discover on their file system. I've also found that applying the arcpy function to each sub-module renders them un-readable by Pro (and likely python itself) as I imagine neither handles on-the-fly decryption without a larger lift.

Packaging

In conjunction with encryption, we are also brainstorming ideas around packaging and licensing toolboxes for end-user delivery. Our long-term goal would be develop a pattern that provides an end-user experience that's nearly seamless from the perspective of Pro. Some examples of what this could look like include:

"installing" a toolbox somewhere that Pro natively reads and the toolbox would appear in an end-user's Geoprocessing pane like any-other GP tools
Licensing tools via Named Users, as this is how most customers are likely managing extensions/add-on functionality
Packaging a toolbox code in a way that protects author IP, while allowing full-functionality to the end-user

Do y'all have any recommendations on patterns you've found to provide some of these core aspects of design? For most cases, are Python toolboxes being distributed internally, and therefore IP considerations may not apply? If so, I'd still love to discuss successful packaging patterns that allow your users to develop and execute your custom logic most efficiently!

Through numerous posts in this community and across others, we've been able to leverage patterns for a handful of design challenges, so thank you all in advance!

HaydenWelch · ‎09-23-2024

I've been working on developing a framework for modular PYT toolboxes under the name pytframe2.

Our internal workflow requires that tools be separated from the toolbox implementation so a central toolbox can be updated on the fly. The biggest issue with this is that separate tool modules are not reloaded until arc or the arc python interpreter restarts. To remedy this I wrote some reloading logic that sits in the .pyt file and reloads all helper modules when the .pyt file is refreshed from arc.

We have not had to distribute this box yet, but I feel like distribution of a framework like this could be handled through git, I even have a simple git tool in the example toolbox that allows a user to sync their repo with a remote or change branches from within arc.

If you want me to go into more detail I'd be glad to. Been working with toolboxes using this system for about 2 years now with iterative improvements over time. One of the most useful lately has been my FeatureClass model that allows a more "pythonic" way to interact with features. It basically just obfuscates the boilerplate logic us setting up and tearing down cursors so you can interact with a feature class as if it was a Python collection object.

a-j-campbell · ‎09-23-2024

@HaydenWelch , thank you for linking back to this!

I did find your framework recently and appreciate the availability of your code. I have encountered the same challenges in my work and utilize a similar "hot reload" logic to ensure the central toolbox does pick up any changes to modules.

From a distribution perspective, are you simply reading the .pyt (and associate helpers) directly from a local source on each Pro workstation? (That would then be synced from source with the tool you mentioned). Our goal is to essentially provide a Windows-style installer to an end user who can then run it and have access to the toolbox in Pro (while protecting our code), but I don't believe we have a good picture of where we should/shouldn't install said files.

I may be reaching back out to discuss further the implementations you've taken in the framework. And hopefully this post can serve as an additional home for discussion around .PYTs more broadly as I think it's a powerful pattern available to us developers and end-users.

Note - I did forget to mention that the discussion of a Pro Add-In is happening and could provide more concrete answers to how we can accomplish the distribution/encryption/licensing we're looking for. But would love some insight for python-based/only tools as well.

HaydenWelch · ‎09-23-2024

From a distribution perspective, are you simply reading the .pyt (and associate helpers) directly from a local source on each Pro workstation? (That would then be synced from source with the tool you mentioned). Our goal is to essentially provide a Windows-style installer to an end user who can then run it and have access to the toolbox in Pro (while protecting our code), but I don't believe we have a good picture of where we should/shouldn't install said files.

My current distribution setup is a central file location for all users. The toolbox is pre-loaded or loaded from a symlink and the modules are then loaded in normally from un-encrypted python code. In my use case the tools are more of a way to get updates pushed instantly to everyone in the company so I can write a tool for doing some specific task and everyone instantly has access.

I secure the code by utilizing git and tracking all changes in a protected remote repo. If I were to distribute this code, I'd probably compile it to .pyc files then add that as a release candidate for users to pull. You could also have protected branches so developers could edit the source then push a release candidate to the production branch that the users in the company have access to. Ideally compiling the files before hand so the source isn't readily available.

That process would be best if you plan on long term development of the toolbox and want to build a CI/CD pipeline for it. Otherwise you could just manually package everything from the development environment and release it as an encrypted .pyt in whatever git/scm solution you use.

If I were you, I'd use the commondata folder that is generated with new arc projects to house the toolbox repo. That way everyone has it in the same location and it is accessible no matter where that project ends up. You could also deploy it online, but relative imports don't work for online pyt files so you'd be SOL on the modularization system.

Maintaining an encrypted toolbox will be a lot of work because you'll be fighting against the language and the systems that ESRI provides, but it is possible. I'd also recommend getting the MD5 Hash of each release and maintaining a version dictionary of those hashes. That way if a user reports an issue you know exactly which release they're using without having to rely on a developer setting some sort of release flag in the metadata.

Luke_Pinner · ‎09-24-2024

For the encryption, some random thoughts:

Don't. Protect IP with an opensource or proprietary licence instead.
Write tools in compiled language, call from python.
SAAS. Tools stored on a geoprocessing server.
One big PYT. Have all modules separate from PYT for development and have a build step to combine all modules to PYT and encrypt. e.g. stickytape.
Load modules from password protected zipfile, e.g. ArchiveImporter and zip-import

Note I've never used any of those examples, they were just the first things I found with quick searches for existing implementations of my last two thoughts.

HaydenWelch · ‎09-25-2024

I don't think the stickytape solution will work sadly as it wouldn't be able to deal with the arcpy imports.

Been thinking about this a bit more and I think that using something like this might work:

def import_tools(tools: dict[str, list[str]]) -> list[type]:
    """Implement me"""
    ...

TOOLS =\
{
    "tool_module":
        [
            "tool_class0",
            "tool_class1",
        ],
    "tool_module2":
        [
            "tool_class2",
            "tool_class3",
        ],
}

IMPORTS: list[type] = import_tools(TOOLS)

# Manually add the tools to the global namespace
globals().update({tool.__name__: tool for tool in IMPORTS})

class Toolbox(object):
    def __init__(self):
        self.label = "Toolbox"
        self.alias = self.label.replace(" ", "")
        self.tools: list[type] = IMPORTS

Where the import_tools function could handle all the licensing and decryption of the tool code. That way you could ship just the .pyt like this and handle the licensing on your server that hosts the production code. This would also mean that users could get updates automatically if you're pulling the tool scripts from a remote location.

DuncanHornby · ‎09-25-2024

I develop RivEX and thought you might be interested in how I protect my source code. I suspect I have a "Heath Robinson" approach compared to others! I develop in an atbx, not a PYT.

So I edit directly into my atbx toolbox. Sometimes I do quick edits from the interface provided when you right click on your script tool and go to properties. When I'm writing the bulk of the code I right click > edit and it all happens in VSCode. So I save my edits and shut down VSCode and it magically appears inside the atbx.

I have a subfolder with some python modules, these hold generic functions and are unprotected. But in themselves they are a small cog in a big machine, so knowing anything about these functions does not really expose the interesting code stored in the atbx.

When looking at the properties of a script there is button to encrypt the code in the execution section. I use this to protect the source code.

Any code in the validation section is completely unprotected, seems like a design flaw to me.

I have simple xml file that encrypts a unique number extracted from the OS and all tools have a function that decrypts this. The xml is built by me after someone purchase a license.

I zip all this up and its available on the website but limits itself if there is no valid xml file.

Limitations of my approach:

Requires user to install into the root directory and nowhere else as the xml lookup is hardwired.
There is no automated way of calling the encrypt button, so I have to manually do it. As I'm the sole developer it's just a pain in the a$$ and something I live with! Have 50 tools, then you need to provide 50 passwords and press that encrypt button 50 times. Not at all ideal.
I push as much of my generic code out into the subfolder, this code is free to see.
Validation code is unprotected, so any decrypting of license files which would seem sensible to do in the validation section needs to actually be done in the execution section, this means every tool is duplicating these functions. Not ideal.

Positives:

Single zip file to distribute and thus easy to install
Maintaining code is easy it's the encryption step that is laborious.
Can take advantage of new functionality in toolbox as ESRI releases it.

Hope that helps?

TopoTheMornin · ‎11-10-2024

I’d love to know what distribution pattern you landed on. Did you end up wrapping the .pyt toolbox in an add-in? Can you share?