We’ve all heard the sentiment that a picture is worth a thousand words. Through the advancement of AI technologies, this idea has become more tangible than ever. Computer vision is an interdisciplinary field of AI, generative AI, deep learning, and machine learning that harnesses the power to identify objects, read text, and analyze patterns from images and videos. It effectively transforms visual content into rich, actionable information. Many utility field operations already require the capturing of images or videos as part of their data collection workflows. This allows computer vision to enhance workflow without major disruption to current processes. While computer vision adoption continues to grow across industries, its successful application in ArcGIS Survey123 is heavily dependent on one critical factor: well-designed prompts.
Why Prompt Design Matters?
Working with computer vision is like working with any of your coworkers, you need to supply the LLM (large language model) with clear and detailed instructions, known as prompts, to perform asset inspections and to produce the detailed results that you are seeking. Avoiding ambiguity is key as it prevents computer vision from using its own interpreting of the prompt or from returning generic results.
In some instances, poorly designed prompts can lead to inconsistent results, irrelevant information, or even misleading results. In field inspections, this can mean missed maintenance issues, inaccurate reports, or wasted resources. Let’s discuss ways that we can improve our prompt writing skills.
Define Specific Objectives
The first step in designing effective computer vision prompts is to define clear and specific objectives of exactly what you want the computer vision to do. We suggest using the four Ws (who, what, where, and why) when starting to design your prompts. This will assist you in gathering essential information to define the request and to clarify the goal you want computer vision to accomplish. You’ll want to include contextual details about the type of asset, the environment the asset is in, and any regulations or company policy driving the need. For example, a basic prompt for maintenance inspections could look like this:
“Identify asset maintenance concerns in this image,”
Through careful prompt design we can transform the request to be more defined about the specific
“Describe the condition of the gas meter. Include details about any rust, corrosion, or damage that is present to the gas meter component of the gas meter assembly. This is part of the exposed gas pipe inspection that occurs every three years.
Use Simple, Direct Language
Industries often rely on specialized acronyms and jargon to communicate quickly. However, these terms can be ambiguous, as the same acronym may represent different concepts depending on the context or department. For example, PM could mean preventive maintenance in asset management and project manager in operations. To ensure the computer vision fully understands your request, it’s essential to use clear, descriptive language in your prompts. Instead of relying on shorthand or industry-specific jargon, spell out the full names of items, processes, or conditions. For instance, rather than asking the computer vision to “check PM status,” specify “check the preventive maintenance status of the transformer.” Technical terms can be made clearer by providing a brief definition the first time it appears in your prompt. This approach minimizes misinterpretation and helps the computer vision deliver precise, actionable results.
Specify Result Format and Length
When designing prompts for computer vision analysis, it’s important to consider not only the accuracy of the results but also how those results will integrate into your system. ArcGIS Survey123 will be storing the results into an attribute field which will have a field type and length.
Keep this in mind as you’re designing the prompt that the results returned will need to match the field type and fit within the specified field length.
Some suggested result formatting you could add to your prompt:
“Provide a concise summary of 500 characters or less of….”
“Use one word to describe…”
“Count the transformers on the electric pole and return the total count as an integer.”
Make Prompts Close-Ended When Possible
Where appropriate, create survey questions that are close ended questions. This approach enables you to define a specific set of possible answers and establish clear criteria for each asset condition in your prompt, resulting in more consistent and actionable outcomes. For instance, you might instruct the system to:
"Return Pass if worker has a hard hat, high-res safety vest, and safety glass or Fail if the worker is missing one of more items of a hard hat, high-res safety vest, and safety glasses.
"If damage has been detected, categorize as Minor/Moderate/Critical based off the severity of the damage."
By framing questions in this way, you minimize ambiguity and ensure that both computer vision systems and human inspectors can interpret responses consistently. Ultimately, leading to more reliable, standardized data that supports robust data analysis down the line.
Error Handling
Mistakes are inevitable. There are times that an asset isn’t centered in the photo, or the photo is of poor quality making determining the content difficult. You’ll want to take this into consideration when designing your prompt. Specify a condition in the prompt about what should happen if the asset isn’t present. This will prevent data from being generated when it shouldn’t be.
For example: “If there is no gas meter in the photo, return ‘No gas meter in photo.’”
Plan for Testing and Refinement
Before deploying your AI enabled survey for production usage, it’s a recommended strategy to test your prompts to evaluate how accurate the results are and to adjust the prompt to improve the results. Your testing plan should include both expected and unexpected input. Expected inputs can be photos from past asset inspects. Unexpected inputs can consist of photos of the incorrect assets, asset not centered in photo, or poor-quality photos.
Be sure to compare the result of the computer vision to results from field inspectors to evaluate how closely they match. If there is too much variation, adjust your prompt and try it again. You may need to experiment with multiple iterations of the prompts before you find the prompt that returns the results that you’re seeking.
In conclusion, designing specific and well-structured prompts is not just a technical necessity but a strategic advantage for utility field operations. By defining clear objectives, eliminating ambiguity, and rigorously testing prompts, organizations unlock the full potential of computer vision which drives higher data quality, operational efficiency, and regulatory compliance. As AI technologies continue to evolve, a commitment to prompt excellence ensures your organization remains agile, safe, and ready to capitalize on new opportunities. Make prompt design a collaborative, ongoing practice, and empower your teams to lead the way in digital transformation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.