By José Ignacio Orlando, PhD -Subject Matter Expert in AI/ML @ Arionkoder- and Nicolás Moreira – Head of Engineering @ Arionkoder-.
Last week we discussed ChatGPT, an AI powered chatbot created by OpenAI that can engage in realistic conversations from input prompts given by the user. While this tool is certainly funny and might come in handy for getting quick answers to almost any question, this week we want to analyze the limits and the potential forthcoming implications of this technology.
Being publicly available for everyone to play around with, ChatGPT has lit the fire of human interaction so massively that it became more than a simple proof-of-concept demo. Social media is already full of prompts and AI-generated answers about various topics, with some being really funny (such as an essay about the connections between guitars and elevators, or a Seinfeld scene in which one of the characters was asked to talk about a sorting algorithm) and some other particularly useful (it wrote a perfectly fine cover letter for a scientific paper manuscript in a matter of seconds, and showed to be able to solve complicated mathematical tasks such as geometrical optics ray tracing).
However, other users (a lot of them, actually) started to notice that some of these pleasant and coherent outputs might sound good… and at the same time be partially or completely wrong! People noticed that sometimes ChatGPT can analyze computer code correctly, but embedding wrong or unrelated facts in the explanation that has nothing to do with it. Others have shown that by asking questions implying that something is true (such as that “DNA computing” –whatever that is– is faster than CPU), the chatbot tries to somehow keep reinforcing that fact in the answer. This even leaks to other fields such as history. One of our own (thanks, Naty!) went deep in trying it out and asked ChatGPT to write a blog post about the role of cotton candy in Egyptology: the output, in the figure below, is aesthetically correct and frankly quite funny to read, but still full of incorrect or fake facts.
The issue of ChatGPT providing verbose coherent answers with incorrect facts has already been signaled by OpenAI as one of its most prominent limitations. However, users don’t go as frequently to the website describing the tool: they prefer to go straight to the chat box and use it. As a result, they usually ignore this fact and end up trusting blindly the outputs of the model. This, combined with the multiple applications that people envision for this great tool, has already started to cause inconveniences in a few areas.
A few days ago, Stack Overflow, the website we programmers all use when we have troubles figuring out how to solve a code issue, decided to ban users from sharing responses generated by ChatGPT. The reason? The very same we talked about before: some answers sound extremely good and detailed, but are actually wrong on close examination. Moderators started to see the website filling up with incorrect pieces of code and explanations that were so confusing that started to damage the actual spirit of the website.
But how to control such a prominent flow of credible answers without banning innocent people that just made a mistake when trying to figure out a response? Currently, Stack Overflow’s moderators are using Gandalf’s rule of “if in doubt, always follow your nose”: if an answer seems to be automatically generated with ChatGPT, they impose sanctions. So, unfortunately, we are expecting quite a high rate of false positives in the upcoming weeks, at least until this hype wave gets calmer. There are no models yet to find out automatically which pieces of text were generated with ChatGPT, although OpenAI has already started making efforts in that direction.
The source of the problem is that people that get the answers from ChatGPT are not taking their time to review them and verify their facts. Yet another example of confirmation bias. And who are we to throw the first rock, right? Having a chatbot that solves our code issues is definitely a dream come true: in a matter of seconds, we can get an answer to something that perhaps would have burnt our brains for hours. We now have in our hands a fantastic tool that can explain to us in detail what we were doing wrong and how to solve it, increasing our solving-ticket rate.
So why shouldn’t we believe in those answers? Well, we shouldn’t blindly trust any AI responses because AI models are trained using data produced by humans, and we already know that we are far from perfect. ChatGPT was trained from a humongous dataset of 500 billion tokens scraped from the internet, including computer code taken from tens of millions of public repositories, without any control about which code was buggy and which one not.
Furthermore, even though the pieces of code were all correct, we cannot ensure that the model will learn how to extrapolate that knowledge to any other problem we face it with. Remember, these large language models are stochastic parrots: they don’t have hard-coded rules at its core to solve any given situation, but millions of learned parameters learned with the sole purpose of mimicking the way we use our language.
The implications of having AI assistants for computer programming are both fantastic and risky at the same time. Tools like Github Copilot or DeepMind’s AlphaCode are already deployed and used by thousands of programmers in the world to accelerate their throughput. However, we still don’t have numbers reflecting how much effort is needed to accommodate the outputs to the actual requirements, figuring out potential sources of errors and finding appropriate ways to correct them. It’s up to us, the humans in the loop, to be alert and place more of our trust in our noses, and less in blind acceptance of the outputs of the tool while using them to our advantage.
Do you need AI tools or NLP models to improve your business processes? Reach out to us so we can help you leverage this technology to your own benefit!