If you present someone else’s work as if it were your own, no one will respect you, and you won’t even respect yourself. If someone is paying you to do thoughtful work, and they come to believe you are not thinking, after all, they will lose their enthusiasm to pay you. If you send someone a lot of data and ideas, and it comes to light that all you did was write a little prompt then paste the results into a file, then at least one of two things will happen: they will like the results and realize that AI can replace you, or they won’t trust the results and feel that you are spamming them.
In this modern world I find I am developing a new reflex. My first impulse, when someone I don’t know well shares any report, analysis, or apparently thoughtful writing, is to not believe it’s real. I am automatically discounting the value of work from people who lack a reputation for original thinking.
And I bet you are doing the same thing, aren’t you? We are already awash in a world of deceptive automated communication. You are already far too smart to fall for typical email spam or text messages from “Diana” asking you to “get tacos” with her and then apologizing for texting to the wrong number. (That scam doesn’t even make sense. When someone gives me his number, the very first thing I do is text him “its me” and then when he responds “yep” I add him as a contact. Nobody looks up someone else’s number out of the blue and sends them a taco feasting request as the first message ever.)
Even if you are an ardent AI fanboy, wouldn’t you prefer to prompt AI for yourself instead of reading someone else’s article generated from a prompt? “Write an essay about AI that would impress a credulous person. Include bullet points or emojis or whatever. You know what to do.”
If you are too young to know this, I’ll tell you: when web search engines first came out, no one was saving and sharing web search pages with each other. If they shared anything, they would share the query-– the prompt. They’d share insights on tricks and tips for Googling, not list all the hits that came back. The result of a Google search is transient, ephemeral, not something that anyone should publish in a book or use as a social media post. I think AI is best used like that: as a personal tool that tells you things you don’t share directly with other people.
Even if you think AI produces consistently good work, you are playing with fire if you let it infect your work. The “fire” is the way other people will begin to assume that all of your genuine work is actually the product of AI. Thus using AI injudiciously could taint your reputation for anything else you do.
Maybe you want to shout back “but James I am partnering with AI! I’m in charge! It’s really my ideas, and the AI is just wordsmithing it!” My response is: Good luck expecting anyone to believe that.
Is it true that you are the senior author of your “AI-powered” writing? I can’t know unless I watch you do the work. But I’m not going to spend the time to do that. Only if you have a strong reputation for original work will I be willing to give you the benefit of the doubt. And I believe I’m not alone. I believe you will come to have the same attitude as I do, and be equally suspicious of other people’s work, whether you admit it or not. In your heart you know I’m right.
Some people say “AI will not replace you. Someone using AI will replace you.” I don’t believe that is true, but for the sake of argument, let’s say that it is. What would happen next? After the world is full of people using AI to do their work, how do humans differentiate themselves? If we can’t differentiate ourselves, then the cost of labor will plummet. If the only way we differentiate ourselves is to be clever about using AI, how will anyone know? AI produces so much slop that’s it practically impossible to review and vet it.
You know what, I do use AI in my work. But here are my rules:
- In general, I treat AI as “automated irresponsibility.” I think of it as a precocious child that may have good ideas, but is totally unreliable.
- I never have AI write any text that goes out under my name. Not one sentence. Ever. I don’t even turn on autocorrect on my phone, for fear it will insert a word I didn’t intend.
- I never rely on an AI-based tool that I do not know to have been systematically and deeply tested. Especially if the tool has to process large amounts of data to give me answers that are difficult to independently verify.
- I never rely on an AI-generated summary of a larger text. (And yes I have been burned by a client who used a tool to summarize my work and came to fundamentally wrong conclusions about my beliefs based on that 20% wrong summary.)
- I allow myself to use GPT queries and deep research tools to explore facts and ideas as a first cut. However, I guard against the probability of AI bias by using other sources outside of AI, and I guard against hallucinations by reviewing every source for myself.
- I allow myself to use a GPT to analyze and critique my own work. This is safe because even if it is wrong in its critique, it might still identify a weakness that I can address.
- I allow myself to use a GPT to make prototypes and throwaway versions of tools. This is fast and fairly safe (although there is a growing danger of hackers poisoning us with malware in libraries that GPTs hallucinate into existence).
I want there to be nothing in my I but me.
Postscript
I asked ChatGPT to summarize the post, above. Here is what it produced:
The author argues that
overreliance on AI for creative or intellectual work undermines credibility, trust, and self-respect. If people realize you’re using AI to generate contentwithout meaningful input, they’ll either feel deceived or conclude you’re replaceable. In a world flooded with AI-generated content, skepticism grows—people instinctively doubt the authenticity of others’ work unless they have a proven reputation for original thinking. Sharing AI outputs as if they were your own ideas erodes youruniquenessand may lead others to question all your future work. The author uses AI cautiously, treating it as a flawed tool for exploration or prototyping, but never for producing final work under their name. Their guiding principle:“There should be nothing in my ‘I’ but me.”
Like most things GPTs produce, it’s not terrible. Just as a gourmet meal isn’t terrible just because there is a hair in it. Still we consider it unacceptable to have hair in our food. This is why I don’t use AI summaries.
The parts I have crossed out were where ChatGPT got it wrong.
- I’m not arguing that overreliance undermines credibility, etc., but rather that any reliance does so (regardless of the merits of the work).
- I’m not warning against people discovering that you are using AI without meaningful input (a phrase that does not appear in my essay) but rather that they discover you are using it at all to produce your work.
- It’s not your uniqueness that is primarily threatened when you use a GPT to write for you and its not clear what is you and what is AI, it’s your credibility. Uniqueness is threatened, too, but that has nothing to do with presenting the work as your own, but will occur even if you are totally open about using AI.
- ChatGPT’s choice of my guiding principle is more what I would call a summarizing thought. For a guiding principle it should have chosen: “think of AI as a precocious child that may have good ideas, but is unreliable.” But if it were truly insightful, it could do better. Here is a guiding principle I would take away from this essay “People judge you not only by the work you show them, but by their beliefs about what you didn’t show them. So protect your reputation, lest AI seem like a better bargain than you.”
Hi James – I like to keep an open mind, so we are looking in to ways that AI tools can support our work (despite my overall skepticism) . Right now I can honestly say that CoPilot Chat saved me about half a day when I needed to do some data analysis, and did not know the best way to set up Excel to do this. Maybe not the most exciting use case, but on the third or fourth attempt it suggested something that worked.
[James’ Reply: Can you be more specific? Did the LLM do the analysis itself (I’ve found that unreliable) or did it write a program to do it, or did it just tell you how to do it?]
Looking in to other claims made – once you cut through the endless hype, and try and put it into the context of a working test team that prioritizes tester autonomy and tester skill (which is the way I like to think of my current team), the benefits look pretty marginal over what other tools and other processes already offer.
To pick one example – Claims are made that LLM’s can write test cases for you… I guess that might be of interest to you if you are still relying on written test cases, and are prepared to lose most of the very limited value you get out of preparing a test case (the bit where you hopefully sit down and do some thinking about testing) – then you might find this attractive. But the prospect of restricting myself in this way horrifies me… Try building a visual coverage model collaboratively, and basing your testing on this instead.
Self healing automated checks is another claim made – to which my response is that a properly designed set of automated checks probably doesn’t need this? It might not be universally true though, there may be some edge cases where this would be worth the effort. I just can’t think of any.
[James’ Reply: Always ask what sort of “healing” is mean? How does the system distinguish between “bug” and “healing needed?”]
Anyway – we are looking in to the claims. As a toolset used judiciously by individuals to support their work and extend their reach then fine – but don’t believe the hype
In this instance I described the problem I was trying to solve and asked for suggestions for an Xcel formula. So essentially I just used CoPilot chat as a search engine. It took three or four attempts to come up with a workable formula.
So – not a very exciting example.
As for “self healing tests” – I have the same question. Right now my team maintains a a reasonably useful set of API checks that run as part of our deployment pipelines. Every morning a couple of us check a dashboard and if we see unexpected results, one of us jumps in to investigate.
It usually takes 15 minutes or so to look in to an unexpected result. I consider this to be well worth the time. It keeps us in touch with the checks, and leads to conversations about long term trends. I struggle to think of a circumstance where I would replace this process with some thing much more opaque like “self healing tests”
Most of the examples I have seen discussed talk about things like “tests” breaking because an element moving on a web page or a menu structure changing, to which my response is:
– Encourage your developers to write more testable code
– Strongly consider using people for this sort of testing anyway
Hi James, you’ve been a big influence on me as a tester for two decades now, so I was curious to see what you thought about the modern state of AI usage. I was wondering, also, whether and how did you integrate the AI tools into your testing process (or, rather, I guess, into a testing process you help your clients establish).
[James’ Reply: I need to approach this responsibly, so I am experimenting with the technology. So far, I am troubled by the poor reliability of the tools and the slippery and vague nature of the way we are told to use these tools.]
I’ve been using LLMs, mostly chatGPT, near daily for about a year now. Most of it is unrelated to work: I do a lot of creative writing, and I find it helpful for feedback or bouncing ideas against.
[James’ Reply: What sort of feedback can it give you that you could trust?]
What I noticed largely aligns with what you talk about: people mistrust any mention of AI, not bothering with nuance, and AI absolutely cannot be trusted or relied upon to do the actual quality writing, as it hallucinates on every turn and cannot handle large texts or any kind of complicated continuity or reasoning. It makes sense, too – LLMs do not understand, they mimic understanding via finely tuned statistics engines. I’m simplifying, but who isn’t. 🙂 Anything original, that by definition doesn’t have a massive dataset behind it that LLM could have been trained on, pretty much leaves LLM reaching for straws and fumbling.
It seems to be the same in testing/development, as it is in creative writing.
I keep hearing about the craze of using AI tools in testing, and I keep checking in and looking for something game-changing, but so far I found nothing.
[James’ Reply: I can think of a couple of potential game changing things GenAI might do IF they could be done reliably and not too expensively. I’d like AI to watch everything I’m doing and be able to write searchable notes on all the things I did and said and saw. I’d like GenAI to be able to crunch all known information about a product and make an outline of testable elements. Stuff like that.]
At least, not for the type of testing I do. The most useful AI-based tool I’ve encountered is the github Copilot saving me from having to google syntaxis when I forget yet again how to do smth in a specific language/library. The latest agentic feature is hit or miss, it can generate very simple API-level auto tests when I tell it exactly what to do, but that level of tests can also be quickly “generated” with the good old “copy-paste-modify.” Still, it has it’s use, and is a bit more exciting than the aforementioned copy-paste-modify. I’ve also used chatGPT to look at the requirements and the factors that modify system state and give me a coverage matrix. Again, it lacks understanding, so the results are just the first draft, but it helps kick start the work. This is a very basic usage of LLMs, as I see it.
[James’ Reply: I think kickstarting the work is dangerous, because it kickstarts you into a set of unknown biases. I suggest instead having it comment on your work or to use the “kickstarting” process AFTER you’ve already done to work to check if you missed anything.]
There are testing tools coming out that claim to change the face of testing though (see testers.ai for example), and I keep getting excited, and then finding absolutely no use for them.
[James’ Reply: I don’t understand why a race car driver driver would get excited about making his job easier. Nor a tester. No one doing difficult things wants it to be easy. What we want is to be more powerful.]
Most of them are targeting testing via the web interface, for starters, and they tend to compare themselves with Selenium-cucumber frameworks, as if that’s some kind of golden standard, and not something everyone I know has been trying to migrate away from for years now. I can’t help but wonder, why are we as an industry even focusing on writing more of those kinds of autotests, when they are the most inefficient, slow and fragile tests to have. I avoid them like the plague, personally, preferring to leave most of the high level integration tests to humans with a small subset of end-to-end auto tests to check infrastructure has been deployed and hooked up correctly in different environments.
[James’ Reply: Doing good testing is hard, so of course most people focus on the easy stuff and don’t think too much about its value.]
I’d rather see AI tools do something like run an existing GUI-based test, identify API calls involved in the scenario and then assist in shifting the auto tests into the API and mocked GUI space. Haven’t seen anything in that space.
Or an AI assistant “sitting” through a session of exploratory testing, taking notes, monitoring the app and alerting the tester to anything interesting. Which I believe, some of the tools you mentioned earlier might be closer to, idk if AI is involved there.
Anyhoo, sorry for the ramble, and if you have time, I’d be keen to hear what is your take on the AI tools in testing and all the hype around it?
[James’ Reply: What you are talking about is the kind of thing I want to experiment with.]
Hi James, thanks for taking time to reply.
>So far, I am troubled by the poor reliability of the tools and the slippery and vague nature of the way we are told to use these tools.
I’m not as much troubled by the poor reliability as I am by the poor recognition of it. I think there is not enough understanding that LLMs are statistics engines at their heart.
>What sort of feedback can it give you that you could trust?
Good question! I’m not a native English speaker, so I tend to sometimes use turns of phrases coming from my native language that just don’t work the same way in English. I ask LLM to give me a running commentary, and when it gets things completely wrong, I know I need to pay attention to clarity in that place. Not necessarily to “fix” it (because it also doesn’t pick up on the complex stuff), but just think about it again. On the other hand, it is nice to see when the clues I put in the text are picked up and reacted on – that’s just showing it’s not too convoluted to follow. I also find it helpful to talk to someone and explain my ideas, that naturally pulls out more ideas.
It’s not about “trust”, it’s about having a useful tool filling the unsatisfied need for a willing and insanely patient beta-reader.
[James’ Reply: I think I could relax my distrust of someone who uses AI in this way– assuming I found them credible over time.]
>I’d like AI to watch everything I’m doing and be able to write searchable notes on all the things I did and said and saw.
I think there might be some of it already on the market? Haven’t tried it myself, but supposedly Google’s AI has a feature where you can share a remote desktop with an agent, with the idea that an agent can watch what you do, listen to you talk and help you out on the go. Plausibly, the ability to do that much shouldn’t be too far away from watching and taking notes on whatever app is on the screen.
I’d worry about the privacy, though. Unless you run a model locally, you are sending info into the cloud, and I imagine for a lot of clients the new features would be a proprietory information they don’t want leaked this way. It’ll be interesting to see how the laws and the expectations adjust in the next few years to account for wide-spread AI usage. I feel like at this point industry is collectively closing its eyes and/or accepting that yeah, we are sending proprietory code out every time we ask LLM to peer review it.
[James’ Reply: Yes, privacy and security are swords of Damocles hanging over our heads.]
>I think kickstarting the work is dangerous, because it kickstarts you into a set of unknown biases.
That’s fair! In that case I used it specifically because I was overwhelmed by having five degrees of freedom (various semi-independent factors affecting the behaviour) and a large number of scenarios, and outsourcing that first draft did wonders, freeing up my brain to actually think instead of trying to hold it on in. But yeah, in writing and in testing, LLM definitely tries to adjust to you and follow your lead, which would naturally reinforce the biases and the blind spots.
[James’ Reply: But is it your work, in the end? I think I would feel alienated.]
> I suggest instead having it comment on your work or to use the “kickstarting” process AFTER you’ve already done to work to check if you missed anything.
I question how useful it is for an experienced tester, though? Don’t we all have our established processes in place to not miss obvious stuff? I wouldn’t expect LLM to pick up on non-obvious valuable stuff. It hasn’t so far for me, at least, but to be fair I’m not putting much time into giving it the deep context about the business side of the app, you can only glean so much from the flawed code.
Do you feed it any extra information when requesting feedback for your work (aside from work itself, I mean, whether it’s code or, I guess, test approach)? I struggle to find the cost/benefit balance and tend to lean towards “it’s easier to just do it yourself start to end than attempt to communicate all the info and context,” but I keep thinking there must be a better way to do it.
[James’ Reply: I haven’t yet done this enough to have a strong feeling about ways to do it. I did it once, though where I just gave it the work and asked if it saw any problems. One of its replies did identify a hole in my work.]
>I don’t understand why a race car driver driver would get excited about making his job easier.
Because most of us aren’t just racing most of the time? 🙂 There is also the sludge of boring things that just need to be done (e.g. maintaining old test code as app code changes, or migrating from the legacy frameworks you are given), and I’m personally always keen to offload that stuff to the automation so I can use my time to do the interesting things that actually require deep thinking. I don’t want to do difficult things, I want to do interesting things. Not all difficult things are interesting.
E.g. Selenium+Cucumber combo was widely popular at some point and got severely misused, and now there are countless frameworks based on it that are a nightmare to navigate and maintain and slow to run. Not ashamed to say if I could have a magic button to migrate the hell out of it into a combo of API tests and GUI tests, I’d do it happily. But alas, we are not there yet.
[James’ Reply: You aren’t quite working within my analogy. I am speaking of a race car driver who WANTS to race cars and IS racing cars. Of course, when the driver just wants to go to the store to buy some milk, that’s a different kind of driving.
Since I’m not a race car driver, but I am a airplane pilot (or at least was in my younger days) I can say that I enjoyed the challenge of flying airplanes and did not want that to go away. What I wanted was MORE challenges, not less. However, I wouldn’t say no to cockpit automation that I could deploy and undeploy as desired, because sometimes safety concerns override anything else.]