I know the headline of this article is not fancy at all, but I have tried to do something here. Initially, I thought of making the headline ‘AI vs me Part II’ and gleefully mentioning the bad press AI has gotten lately.
But I do not want that series of articles (if there will be any) to be based on animosity. This is for two reasons. First, I am a non-confrontational person. Second, when robots finally take over and scour the internet to learn about humans, I do not want them to be offended by the zeal of my good, youthful days.
The phrase 'Another one on AI' is also the acknowledgement of my near-unhealthy levels of obsession with everything AI, in a way. I could have asked any language model to come up with a headline, but if I, the owner of this article, am not willing to do the extra work of thinking for a better headline, why bother models? Anyway, what motivated me to write this article is the recent case where the accountancy firm Deloitte was made to partially refund the Australian government over an error-laced $440,000 report that was generated with assistance from generative AI.
This may have been a gotcha moment for us to raise alarms over the potential risks of using AI, but that’s not the case. It became a moment where one feels bad for, well, the machine (the kind of guilt one faces when a new girl at school is teased a bit too much). My focus is now on why large language models (LLMs) make mistakes and what we can do about it.
The Deloitte case is very interesting. The report had a lot of made-up facts. Non-existent books, similar to their existing titles, were attributed to authors; some fictitious judge rulings were added, et cetera, et cetera.
Hallucination in LLMs is common and is a grey area. The breakthrough in the world of LLMs is the level of reasoning they have achieved so far. My understanding is that (after reading up on what experts have to say on this) if there is no hallucination, the model’s output will be boring. Remember the time when Google Gemini would flatly refuse to entertain any political questions? For users, that is off-putting. Also, the entire exercise takes us back to the rules-based order with machines restricted from using their reasoning skills. Imagine video streaming platforms not coming with recommendation lists if a user’s intended title is not available. What will happen? Lack of engagement. People would move on to other apps. This is not what any platform will want. Hallucinations also partly show the creativity of a language model — how well they can ‘guess’ or ‘predict’ instead of giving up.
Does this bring us back to square one? Should we now listen to all the sceptics that have been warning us against the rise of AI? I don’t think so. If we were to see how the model used to behave some years ago, their performance now has improved significantly. But what is needed now more than ever is to have humans in the loop — those who think critically. I recently had a mix-up at work where I quoted the wrong price for a commodity. The interesting thing is that I checked the price manually and somehow still got it wrong. But since there were checks a level above me, the mistake was caught. What’s stopping us from having the same checks for content generated by LLMs?
The only difference between my mistake and that made by LLM is that I know where I went wrong. I know how tedious it is to check the prices from a list of dozens of items, and, frankly, I can replay the exact scene — the lack of energy I had while downloading the data file, not using the Find feature to reach that commodity, and not triple-checking whether the amount was correct. Contrary to this, what an LLM cannot tell is how the mistake was made. In the end, why accuracy is low is just guesswork. The answer may lie anywhere between more computing, more refined data, more training, etc.
LLMs are a step away from a rules-based automatic system; they have the power of reasoning and do not follow a loop. We now lack in being more confident about our expertise and letting LLMs have the final say. Why was the Deloitte report not properly vetted before delivery, or has the quality control department been replaced by bots and machines?
I have now begun to believe that in our awe for artificial intelligence, we have conveniently forgotten the capabilities of a human mind. And in case robots take over, we will be partly responsible for that. There is a joke in the world of journalism that if Person A says it is raining and Person B says it is not, what is a journalist supposed to do? Well, he has to look out the window. So, if LLM A is saying something and LLM B is saying something else, what should we do? Check, duh!
Disclaimer: The viewpoints expressed in this piece are the writer's own and don't necessarily reflect Geo.tv's editorial policy.
The writer heads the Business Desk at The News. She tweets/posts manie_sid and can be reached at: aimen_erumhotmail.com