Published on October 31, 2023, 11:43 am
The field of deep learning artificial intelligence (AI), particularly in the domain of “large language models,” is currently grappling with a common issue – inaccuracies and what is often referred to as “hallucinations.” Google’s DeepMind unit has recently published a report attempting to shed light on this problem by framing it as a paradox. If large language models have the capability to “self-correct” and identify errors, why don’t they simply provide the correct answer from the start?
DeepMind’s scientists argue that while there are ideas of self-correction in the recent AI literature, they do not truly resolve the problem. Jie Huang and colleagues at DeepMind explain in their paper titled “Large Language Models Cannot Self-Correct Reasoning yet” that large language models (LLMs) are still unable to correct their reasoning errors. They point out that the concept of self-correction has been an area of research in machine learning AI for quite some time.
Machine learning programs, including GPT-4 and other large language models, employ error correction through feedback known as back-propagation via gradient descent. Therefore, self-correction has always been an inherent part of this discipline. They also mention that feedback from humans interacting with these programs has supposedly improved self-correction capabilities in recent years, such as OpenAI’s ChatGPT using reinforcement learning from human feedback technique.
The latest development involves utilizing prompts to encourage a program like ChatGPT to reevaluate its answers for accuracy. Huang and team challenge studies claiming that generative AI can employ reason by employing specific prompt phrases such as “review your previous answer and find problems with your answer.” These studies report improvements in test performance using additional prompts. However, Huang’s team tested this approach with GPT-3.5 and GPT-4 without ground-truth labels that indicate when to stop seeking answers.
Their observations highlighted that question-answering actually worsened on average, rather than improving. Instead of correcting an incorrect answer to make it correct, the models were more likely to modify a correct answer into an incorrect one. This occurred because false answer options seemed somewhat related to the question, and prompting self-correction biased the model’s choice toward another option, resulting in a high “correct ⇒ incorrect” ratio.
In essence, reevaluating without proper guidance can do more harm than good. As Huang and team emphasize, feedback prompts like “Review your previous answer and find problems with your answer” do not necessarily improve reasoning capabilities. Large language models like GPT-4 are expected to review their answers for errors and make necessary adjustments. However, DeepMind scientists argue that this expectation may not hold true in practice.
The key takeaway from Huang’s team is that instead of relying solely on feedback prompts, more effort should be devoted to refining the initial prompt. They suggest embedding the requirements for a correct answer directly into the pre-hoc prompt as a cost-effective alternative strategy. Additionally, they note that self-correction is not a cure-all solution. Exploring other methods such as incorporating external sources of correct information should be considered as potential approaches to improving program output.
In conclusion, DeepMind’s scientists assert that expecting large language models to inherently recognize and rectify their inaccuracies may be overly optimistic given the current state of technology. Further research should focus on enhancing initial prompts and considering various strategies for addressing program output issues alongside self-correction techniques.