The earliest incarnation of this account first appeared in the publication Quanta Magazine. Among the vast array of capacities that distinguish human beings from all other forms of life, scholars and philosophers have long debated which particular abilities truly set our species apart. For centuries, language has been at the forefront of this inquiry. From the time of Aristotle, who famously described humanity as ‘the animal endowed with language,’ the notion that speech and linguistic expression uniquely define the human condition has remained deeply influential. Yet in the modern era—particularly with the emergence of expansive artificial intelligence systems capable of generating convincing text—the question has gained new urgency. As advanced models such as ChatGPT now emulate human conversation with remarkable fluency, researchers are driven to investigate whether certain features of human linguistic thought remain beyond the reach of both nonhuman animals and artificial entities, or whether machines can, in fact, mirror the cognitive intricacies underlying our use of words.
One especially tantalizing dimension of this investigation concerns the ability to engage in metalinguistic reasoning—that is, reasoning about language itself rather than merely using it. Within the community of linguists, opinions diverge sharply on this point. A number of prominent thinkers contend that such reasoning is not merely absent in artificial systems but fundamentally impossible for them to achieve. This position was articulated emphatically by the esteemed linguist Noam Chomsky and his collaborators in a 2023 opinion essay published in The New York Times. There, they argued that the genuine scientific explanations for linguistic behavior are profoundly intricate and cannot be distilled from data exposure alone, no matter how vast the data set may be. In other words, a model submerged in endless quantities of textual information might learn to manipulate language convincingly but would lack the deeper analytical awareness required to grasp its structural logic or creative generativity.
Challenging that long-held assumption, however, was a recent groundbreaking paper authored by Gašper Beguš, a linguist at the University of California, Berkeley; his former doctoral student Maksymilian Dąbkowski, also of Berkeley; and Ryan Rhodes of Rutgers University. Their research sought to test a range of state-of-the-art large language models (LLMs) through a carefully constructed series of linguistic assessments. These tasks were intended not simply to measure fluency but to evaluate whether the models could infer grammatical principles and construct abstract representations of linguistic systems. One remarkable aspect of their experiment involved prompting a model to deduce the grammatical rules governing a fabricated, entirely novel language—a test designed to ensure no preexisting knowledge could bias the results. While most of the systems faltered, unable to replicate the intricate forms of syntactic reasoning that characterize human cognition, one model performed strikingly well. It demonstrated analytical skills that approached the competence of a graduate-level linguistics student, identifying hierarchical sentence structures, resolving ambiguities in meaning, and accurately applying advanced grammatical mechanisms such as recursion. This outcome, Beguš remarked, ‘challenges our understanding of what AI can do,’ suggesting that the boundaries between human and artificial reasoning might not be as fixed as once presumed.
The study’s significance was not lost on other experts. Tom McCoy, a computational linguist based at Yale University who had no involvement in the project, emphasized its relevance in the modern technological landscape. As he noted, societies are increasingly reliant on artificial intelligence systems for communication, information processing, and knowledge generation. Therefore, comprehending both the capabilities and limitations of these tools has become a matter of growing importance. McCoy suggested that linguistic inquiry provides an ideal experimental domain for examining whether such models can truly approximate human reasoning, since language inherently embodies the complexities of logic, meaning, and abstract thought.
Yet designing experiments to rigorously test this hypothesis presents immense challenges. One primary difficulty lies in ensuring that AI models do not simply recall the answers from their training data. Modern language models absorb vast corpora drawn from the internet, encompassing billions of words across many languages and disciplines—including, in many cases, entire linguistics textbooks and research papers. Given this exhaustive exposure, the risk that a model might merely reproduce memorized content rather than demonstrating genuine understanding is considerable.
To address this issue, Beguš and his colleagues meticulously devised a four-part evaluation designed to circumvent pretraining contamination. In three of these sections, the researchers required the models to construct syntactic tree diagrams—visual representations of sentence structure that trace the hierarchical relationships among words and phrases. Such diagrams, first introduced by Chomsky in his seminal 1957 work Syntactic Structures, remain a cornerstone of theoretical linguistics. They depict the underlying architecture of sentences by segmenting them into constituent categories: noun and verb phrases, further subdivided into nouns, verbs, adjectives, adverbs, prepositions, and conjunctions. This approach allows linguists to map not just the surface arrangement of words but the deeper patterns that govern meaning and grammatical coherence.
A particularly illuminating component of their test centered on the principle of recursion—the ability to embed phrases within other phrases indefinitely. A simple declarative sentence such as ‘The sky is blue’ stands at one end of the complexity spectrum. Slightly more elaborated, ‘Jane said that the sky is blue’ introduces a clause embedded within another. Theoretically, this nesting can continue without bound, as in the extended yet grammatically correct phrase ‘Maria wondered if Sam knew that Omar heard that Jane said that the sky is blue.’ This capacity for infinite recursive embedding is among the features that many scholars consider quintessentially human, capturing our unique aptitude for hierarchical and symbolic thought. By examining whether language models could recognize and reproduce such recursive structures, Beguš and his team sought to determine whether artificial systems are beginning to approximate not only the superficial form but also the generative depth of human linguistic cognition.
Sourse: https://www.wired.com/story/in-a-first-ai-models-analyze-language-as-well-as-a-human-expert/