Editorial graphic reading “Copilot: The Honest Witness,” with an abstract eye, layered interface-like panels, and muted gray, black, and teal elements suggesting observation, uncertainty, and AI interpretation. — Copilot was the most honest AI system I tested. It found real sources, admitted what it could not verify, and mostly avoided the usual performance of machine certainty. Then it made the quieter mistake: it confused evidence with understanding. That mistake matters more than it looks like it does.

There is a particular kind of frustration that comes from watching someone do almost everything right and still miss the point. Copilot gave me that experience.

Over the last few weeks, I have been stress-testing five major AI systems by asking each of them the same questions about my work. I used myself as the test subject for a simple reason: I know the material. If a system gets me wrong, I will catch it. If it gets me right, I will know why.

Copilot was the last one I tested. By then, I had a clear sense of the usual failure modes: systems that invented citations, systems that could not find external sources, systems that described my work accurately at the surface level and then stopped just short of understanding what holds it together. I was braced for more of the same.

What I got instead was something I had not expected: a system that knew what it did not know.

When I pushed back on an inaccurate answer about my major projects, Copilot did not double down. It said, in substance: I can see your domain. I cannot follow your outbound links. Anything beyond that would be fabrication, and I am not going to fabricate it.

That is not nothing.

Most AI systems, when pressed, generate something that sounds like an answer. Copilot declined. In the context of this experiment, that was notable. It did not solve the problem, but it did identify the boundary. From a trust standpoint, that matters. From a discoverability standpoint, it is also a problem.

Copilot did find real external sources. It surfaced the Talking Picturesinterview by Alasdair Foster, LensCulture awards, Aline Smithson's writing on Lenscratch, Analog Forever Magazine, and the Rhode Island Center for Photographic Arts. That is a legitimate result set, not just my own domain reflected back at me. A couple of the other systems I tested could barely get past my website. Copilot got further.

The images it returned were accurately mine. It got the awards largely right once it reached the relevant sources. It correctly identified the Talking Pictures interview as a primary source and leaned on it appropriately.

I did have to push it to get there. This was not magic. It was supervised labor.

Then I asked which artists were most similar to me.

That is where I watched a capable system make a silly choice and call it rigor.

Copilot listed the five other artists in the RICPA Behind the Lens: 2024 exhibition. Then it added a handful of artists I had featured in my FRAMES columns. I could see the logic: it lacked direct evidence of practice-based similarity, so it relied on curatorial and editorial adjacency.

I understand why it did that. It was trying not to hallucinate or fabricate. That is the right instinct.

But curatorial adjacency is not the same thing as artistic similarity. Being in the same group exhibition does not mean two artists are in conversation. It means a curator made a selection. Those are different things, especially in an annual survey exhibition of photography by women during Women's History Month — which is what RICPA's Behind the Lens: 2024 was — where the selection criteria are gender and contemporaneity, not practice-based affinity. That Copilot used it as a proxy for similarity says less about the exhibition than it does about the limits of citation-based reasoning.

A stronger answer would have mapped my work through actual practice: hybrid process, grief as structure, political post-photography, the sustained use of beauty as a lure toward difficult content, and the unstable relationship between photography and evidence. It would have compared those concerns to artists working in genuinely related territory, whether or not we had ever shared a wall.

That requires knowing the field well enough to make the connection.

Copilot did not go there. It stayed inside what it could cite and called that completeness.

Like the other systems, Copilot identified the obvious cluster of themes: loss, memory, grief, female identity, hybrid process, and social critique. That is accurate. Those themes are present. They are consistent. They are documented on my site and in the press.

But those themes are not the engine.

They are the evidence left behind after the engine has already done its work.

What none of the systems identified was the thing connecting those themes: the reason a practice can move from Hawaiʻi tourism critique to Roe v. Wade, from marital crisis to legal blindness, from political satire to social media identity, and not be read as just a scattered pile of concerns.

The organizing principle is disillusionment.

Not as a mood. Not as a tone. As a generative condition.

The collapse of the story you were told, or told yourself, about how something was supposed to go. The gap between expectation and experience. Paradise is not paradise. Marriage is not safety. The body is not reliable. Memory is not evidence. Beauty is not innocence. Photographs are not proof. Social media identities are performances. Political systems do not protect the people they claim to serve.

That is where the work starts.

Copilot could not get there.

It built a theme-frequency matrix. It correctly identified hybrid process as one of the most structurally dominant elements across my practice. It generated a clean curatorial narrative. Much of it was accurate.

It was also hollow.

It was the kind of thing one might write when trying to make a body of work sound intentional without actually knowing what the intention is. The words were organized. The logic was missing.

I do not say that as a condemnation. I say it because this is the precise gap I have been trying to map.

AI systems can identify what a practice contains. They can catalog, cross-reference, summarize, matrix, and arrange. They can produce a plausible curatorial paragraph. They can sound competent. Often, they even are competent.

But that is not the same as understanding.

Copilot was the most honest system I tested. It knew its limits and said so. It found real external sources, whereas some of the other systems mostly returned reflections from my own website. It avoided flamboyant forms of fabrication. Those are genuine strengths.

But the most useful finding may be this: honesty about limitations and accuracy within those limitations are not the same thing as interpretation.

A system can be correct about nearly everything it tells you and still not know what it is looking at.

I know what my work is about. The question was whether Copilot did.

It just had the grace to be uncertain.

- Q: What is “Copilot—The Honest Witness” about?
  A: It is an essay about testing Microsoft Copilot as an AI research and answer system for contemporary art. Diana Nicholette Jeon uses her own practice as the test subject because she knows when the machine is wrong, lucky, or merely fluent.
- Q: What did Copilot do well in the test?
  A: Copilot found real external sources, including interviews and art publications, and it recognized when it could not verify claims. That refusal to fabricate was not glamorous, but it was important. In AI systems, “I don’t know” is sometimes the most intelligent sentence in the room.
  Q: Where did Copilot fail?
  A: It confused documented adjacency with artistic similarity. It treated shared exhibitions and editorial proximity as evidence of related practice, when those are not the same thing. A curator placing artists in the same exhibition does not automatically mean the artists are in conversation.
  Q: What did Copilot miss about Diana Nicholette Jeon’s work?
  A: It identified visible themes such as grief, memory, loss, female identity, hybrid process, and social critique. But it missed the deeper structure: disillusionment as the generative condition behind the work. The essay argues that themes are evidence, not the engine.
  Q: Why does this matter for artists and AI search systems?
  A: Because being findable is not the same as being understood. AI systems can summarize, cite, cluster, and sound competent while still flattening the work into a neat pile of labels. Artists need accurate source trails, but they also need interpretation that does not mistake metadata for meaning.

Copilot—The Honest Witness

Teaching ChatGPT the Shape of Your Voice