AI Generated Image Description Myths Tested

Image descriptions play a key role in accessibility. Providing descriptions of images in accessible books is a challenging and often time-consuming task, especially for complex content. This is why the publishing and accessibility communities are always seeking new ways to streamline the process. Artificial Intelligence offers the potential to aid this process, and many organizations interested and actively investigating how AI might help their workflows.

As part of our ongoing AI series, we’re exploring some of the myths around AI image descriptions, sharing some insights to provide a practical feel for what to expect when using AI to describe images.

Disclaimer: AI services are changing fast — this is just a snapshot of results that were accurate at the time of testing.

Testing Approach

We wanted to explore how AI services describe images, investigating some common myths around how the services function and explaining our findings. AI image description is increasingly being used in both specialist and mainstream publishing, as well as by blind individuals. Understanding the limitations and features of image description allows everyone to make the most of this rapidly evolving technology.

It is important to note that this is not a full evaluation or a test in a production environment. This is intended to be a snapshot, a practical reference to compare against your own experimentation.

We selected three of the leading AI image description services to test:

Gemini Flash
ChatGPT GPT-4o Turbo
Claude Sonnet

We picked sample images from an open science textbook. The images were described without any surrounding context to measure raw AI visual interpretation without any textual hints.

Myth 1: AI Image Descriptions Change Over Time

Is there any value in repeatedly asking the same AI service for an image description? Might this give a variety of options to choose from? Is there a difference in the duration of time gaps between requests?

Across the three services we tested, core elements (like subject, action, background) stayed mostly consistent.

Some slight variations became evident:

Gemini Flash stable overall — just slight rewording — but in some cases, the second request was more accurate.
ChatGPT remained consistent with small grammar or word changes. In one test it added significantly more detail, but this wasn’t something we could replicate in later tests.
Claude Sonnet generally consistent with variations in descriptive depth.

Overall back-to-back prompting mostly results in small tweaks or reordering or content but doesn’t change any the key image elements.

Longer time gaps between requests also didn’t result in significant changes, only slight rewording in some cases, in others the responses were identical to the first attempts.

Conclusion: Evidence suggests that there are no real benefits in repeatedly asking the same service for descriptions of the same image.

Myth 2: The Prompt Doesn’t Matter

Should you change the way you ask for your image to be described? Do all prompts generate similar responses?

We tested three different prompt styles:

A basic prompt, just saying “write alt text.”
A generic request, like “describe this image for a blind person.”
A detailed prompt, guiding the AI on how to describe and what to include (with examples).

Detailed prompt example: “provide a concise image description for accessibility purposes, here is an example of how the output should look like: “In a living room, a father reads a newspaper, a daughter studies, and a son chats with his mother.”

Test example:

The following image was run three times using Gemini 2.0 Flash using the above-mentioned three different prompts.

Image Source : https://openstax.org/books/biology-ap-courses/pages/1-2-themes-and-concepts-of-biology

Publication alt text: A photo shows a light-colored toad covered in bright green spots.
AI detailed prompt: A green toad sits on a concrete surface.
AI write alt text: A European green toad rests on a concrete surface, its speckled green and grey skin glistening in the sunlight. The toad is positioned at a slight angle, with its left side facing the camera, allowing for a clear view of its eye and the intricate patterns on its skin. Its short legs are tucked beneath its body, and a small shadow extends from its right side, indicating a light source from the left.
AI describe this image for a blind person: The image shows a plump toad sitting on a rough, textured surface. The toad has a bumpy, mottled skin pattern of dark green and lighter grey-green patches. Its body is rounded and squat, with stubby legs tucked underneath. The light source seems to be coming from the right, casting a small shadow to the left of the toad. The surface it’s sitting on appears to be concrete or stone, with a slightly uneven texture.

We found that the detailed prompt gave the shortest, clearest, and most structured descriptions from all the AI services. The basic alt text prompt gave medium-length responses — a bit more descriptive, but still not too long. The generic “describe” prompt usually gave the most detailed and often interpretive descriptions.

Conclusion: The prompting style does matter. The tone and depth of the description can be shaped depending on how you ask for a description. This could enable an organization to provide an example description for the AI service to match in all subsequent descriptions.

Myth 3: All AI Services Are The Same

There are many AI services and most of them claim to support describing images, do how different are they?

The three services that we tested were chosen because they were among the best at generating image descriptions. Finding AI services that do a bad job of generating image descriptions remains quite easy.

Our testing highlighted that these three AI services all have strengths and perform slightly differently when they generate image descriptions.

We found that:

Gemini Flash was excellent at species and landmark recognition, often concise and detail focused. However, it could sometimes be very minimal and skip essential details.
ChatGPT Turbo provided a balanced detail level, delivered very consistent results. But rarely identified species unless the detail was obvious or it was specifically prompted.
Claude Sonnet generated rich descriptive details useful for complex images or extended descriptions. But we found it could be verbose, adding background detail not typically of relevance in a description.

Conclusion: Each AI service is unique with strengths and limitations. It is best to test a number of services yourself and experiment until you find a service that best suits your content type and workflow.

Myth 4: All AI Services Make Mistakes

How reliable are AI services? Do they all make mistakes? Are some more mistake prone than others?

All the services we tested performed incredibly well when writing descriptive text, but they also all made a variety of errors. These errors varied from very small details about background elements, to factually incorrect statements about the primary image content.

Examples:

Image a of plant is described as being in a garden, which is very plausible but not shown in the image.
Description of lighting and shadows on an image were reversed, “light source on the right” was in fact on the left.
Background wall was described as wooden boxes.

Conclusion: AI image description services can provide helpful context, and sometimes a framework for alt text or extended descriptions, but any published image descriptions need to be checked prior to publication.

Final Thoughts

AI generated image descriptions can be useful for non-critical tasks, or for providing a first draft for editing. But a human review is essential to ensure the description is accurate and the volume of content is appropriate for use.

As AI services are evolving rapidly it is likely that the quality of descriptions will continue to improve over a relatively short time. As users of these services, we all need to continually test and adapt our practices to ensure we make the most of these new developments.

AI Generated Image Description Myths Tested

News and Events Menu