Classify Objects Using Deep Learning with VisionLanguageClassification

cepsgis · ‎03-19-2025

Howdy,

I have been running into a bunch of issues trying to get this model to work. I don't really know what I am doing wrong I think it may have something to do with my ai connection file. I have tried with both OpenAI which generates output but it looks like it is not actually using my api key since it isn't consuming credits. After fiddling with it I switched to llama which just crashes ArcGIS Pro.

Here's how OpenAI .ais file looks:

{
"service_provider" : "OpenAI",
"api_key" : "API_KEY"
}

Here's how the llama .ais file looks:

{

"service_provider" : "local-llama"

}

@RohitSingh2

DanPatterson · ‎03-19-2025

You have this question posted in the ArcGIS Living Atlas of the World place in Community.

Is this your intended location? or are you looking to move it elsewhere like

ArcGIS Image Analyst - Esri Community

... sort of retired...

cepsgis · ‎03-19-2025

@DanPatterson I was a little bit iffy on where exactly to post it. Maybe I'll cross post it.

SupratimBanik · ‎03-20-2025

Hi @cepsgis,
To use OpenAI models, your .ais file should be structured as follows:
{
"service_provider": "OpenAI",
"api_key": "your_api_key",
"deployment_name": "gpt-4o" // Change this to the model you want to use (e.g., gpt-4o, gpt-4)
}

For using a local LLaMA model, your .ais file should look like this:

{
"service_provider": "local-llama"
}

To use a LLaMA model locally, follow these steps:

Create a Hugging Face account at https://huggingface.co/join
Open a Python command prompt and run huggingface-cli login. This will prompt you for an access token, which you can get from https://huggingface.co/settings/tokens.
Visit https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct and accept the terms to gain access to the model.
Run huggingface-cli download meta-llama/Llama-3.2-11B-Vision-Instruct in the Python command prompt to download the model.

Also after these steps make sure that you have got the llama weights at C:\Users\<username>\.cache\huggingface\hub\models--meta-llama--Llama-3.2-11B-Vision-Instruct location in your machine.

This should help you set up the models correctly. Let me know if you run into any issues!

cepsgis · ‎04-24-2025

Okay, after extensive debugging, I’ve identified the issue. Running this model with LocalLLaMA isn’t possible because the deep learning libraries currently available through Esri do not meet the minimum requirements—specifically, Torch and Transformers. Any suggestions?

SupratimBanik · ‎04-25-2025

Hi @cepsgis,
The required versions of the libraries are packaged within the DLPK itself. If you're encountering any specific errors while trying to run the model, could you please share the details? I'd be happy to help troubleshoot further.

cepsgis · ‎04-25-2025

Classify Objects Using Deep Learning
=====================
Tool Path

Input Raster 615950.sid
Output Classified Objects Feature Class C:\batch\vision-language\Default.gdb\c615950_ClassifyObjectsUsing
Model Definition C:\batch\vision-language\VisionLanguageClassification.dlpk
Input Features
Class Label Field ClassLabel
Processing Mode PROCESS_AS_MOSAICKED_IMAGE
Arguments classes 'Grass, Rock';additional_context 'You are looking at arieal imagery, you need to find all the rock and grass for this imagery';strict_classification false;ai_connection_file C:\batch\vision-language\scripts\ai_connection_file.json
Caption Caption
=====================
Messages

Start Time: Friday, April 25, 2025 8:07:14 AM
ERROR 999999: Something unexpected caused the tool to fail. Contact Esri Technical Support (http://esriurl.com/support) to Report a Bug, and refer to the error help for potential solutions or workarounds.
Unable to obtain configuration properties associated with the raster function.
Traceback (most recent call last):
File "C:\Users\BCSERE~1\AppData\Local\Temp\ArcGISProTemp15504\VisionLanguageClassification.dlpk\VisionLanguageClassification.py", line 352, in getConfiguration
import torch
ModuleNotFoundError: No module named 'torch'
Configuration properties returned by the python raster function is not a python dictionary.
Failed to execute (ClassifyObjectsUsingDeepLearning).
Failed at Friday, April 25, 2025 8:07:15 AM (Elapsed Time: 0.50 seconds)

SupratimBanik · ‎04-28-2025

Hi @cepsgis,
Before using the Llama vision model, ensure that the supported deep learning libraries are installed. For more details, check the Deep Learning Libraries Installer for ArcGIS. Torch is part of the deep learning libraries installer.

cepsgis · ‎04-29-2025

Deep Learning Libraries ship with pytorch 2.0.1