Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Shortcut learning of large language models in natural language understanding
Du M., He F., Zou N., Tao D., Hu X. Communications of the ACM67 110-120,2024.Type:Article
Date Reviewed: Nov 21 2024

Du et al. write: “The shortcut learning behavior has significantly affected the robustness of [large language models, LLMs].” Predictions in these models “rely on dataset artifacts and biases within the hypothesis sentence.” LLMs (GPT-3 and T5) use a prompt-based training paradigm, where a snippet of text is provided to the LLM as input. The LLM is expected to provide a relevant completion of this input, for example, the word “Hello” could be a snippet for: “Hello, how can I help you?”

This article presents a comprehensive performance review of “the shortcut learning problem in the pre-training and fine-tuning training paradigm of medium-sized language models (typically with less than a billion parameters).” A model’s lack of robustness is attributed to biases in the training data, for example, shortcut refers to the training based on non-robust features, as it fails to capture robust features and high-level semantics. The non-robust features are helpful in the generalization for development and test sets as long as the patterns in the new data are similar to the patterns in the data the model was trained on.

Comparisons between LLMs of similar architecture but different sizes, for example, BERT-base with BERT-large and RoBERTa-base with RoBERTa-large, show that large versions generate consistently better than base versions, with a small accuracy gap between out-of-distribution (OOD) and independent and identically distributed (IID) test data. This shows that “smaller models are more prone to capture spurious patterns and are more dependent on data artifacts for prediction.” Standard training procedures are bias toward learning simple features, referred to as simplicity bias, and remain invariant to complex predictive features. The authors explain:

Models tend to learn non-robust and easy-to-learn features at the early stage of training. For example, reading comprehension models have learned the shortcut in the first few training iterations, which has influenced further exploration of the models for more robust features.

Additionally, “the present LLM training methods can be considered as data-driven, corpus-based, statistical, and machine-learning approaches.” While a data-driven approach may be good for certain natural language processing (NLP) tasks, “it falls short in relevance to the challenging NLU tasks that necessitate a deeper understanding of natural language.” IID performance is on par with human performance, but OOD is far below both human and IID; “debiased algorithms are thought to achieve better generalization because they can learn more robust features than biased models.” The study on robustness of prompt-based huge-sized language models such as GPT-3 and GPT-2 found that LLMs are susceptible to majority label bias and proposition bias, that is, “they tend to predict answers based on the frequency or position of the answers in the training data.”

The article concludes that the current standard of data-driven training results in models that perform low-level pattern recognition, which are useful for low-level NLP tasks. For more difficult natural language understanding (NLU) tasks, it is necessary to introduce “more inductive bias into the model architecture to improve robustness and generalization beyond IID benchmark datasets,” as well as “more human-like common sense knowledge into the model training.” Furthermore, “the current pure data-driven training paradigm for LLMs is insufficient for high-level natural language understanding.” To achieve that, a future data-driven paradigm “should be combined with domain knowledge at every stage of model design and evaluation.”

Overall, the article provides a unique and precise review of the current state of research in NLU and LLMs.

Reviewer:  K R Chowdhary Review #: CR147846
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Natural Language (H.5.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Natural Language": Date
Designing effective speech interfaces
Weinschenk S., Barker D., John Wiley & Sons, Inc., New York, NY, 2000.  405, Type: Book (9780471375456)
Jun 1 2000
Spoken dialogue technology: enabling the conversational user interface
McTear M. ACM Computing Surveys 34(1): 90-169, 2002. Type: Article
Jul 26 2002
Limitations of concurrency in transaction processing
Franaszek P., Robinson J. ACM Transactions on Database Systems 10(1): 1-28, 1985. Type: Article
Jan 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy