Lost in Translation: Can We Tell Human Poetry From AI?

Author: Denis Avetisyan

A new study challenges our assumptions about creativity and authorship by revealing that Czech speakers struggle to identify poems written by artificial intelligence.

Perceived poem authorship significantly influences assessments of meaning, with interpretations shifting as attribution changes.

Research demonstrates a significant disconnect between perceived authorship and aesthetic judgment in responses to both human- and AI-generated Czech poetry.

While increasingly sophisticated large language models challenge traditional notions of authorship, their creative capacities remain largely unexplored beyond dominant languages like English. This study, ‘The author is dead, but what if they never lived? A reception experiment on Czech AI- and human-authored poetry’, investigates whether native Czech speakers can distinguish between poems written by humans and those generated by AI. Participants struggled to correctly identify authorship, yet demonstrably rated poems they believed were AI-generated as less favorable, despite often preferring them when authorship was unknown. This suggests that aesthetic judgment is inextricably linked to perceived origin, raising the question of how such biases will shape our engagement with AI-generated art in the future.

The Subtle Art of Poetic Distinction

Assessing poetic merit extends considerably beyond merely determining if a work is logically structured or easily understood. True evaluation necessitates recognizing the significance of more subtle qualities, notably a poem’s capacity for playfulness and the strength of its imaginative leaps. These attributes, though difficult to quantify, are central to the artistic experience, fostering unexpected connections and challenging conventional thought. A poem brimming with inventive imagery and a lighthearted approach to language can resonate deeply, even if it doesn’t adhere to strict narrative or thematic constraints. Indeed, the willingness to embrace ambiguity and explore unconventional ideas often defines a poem’s lasting power, distinguishing it from prose that prioritizes clarity and direct communication. The most compelling poetry, therefore, doesn’t simply convey information; it invites the reader into a realm of possibility, stimulating the senses and igniting the imagination.

Poetic form significantly dictates the prioritization of aesthetic attributes, fundamentally altering reader engagement. Nonsense poetry, for example, deliberately subverts conventional coherence and seriousness, instead celebrating playfulness and imaginative linguistic construction; a reader approaches such verse anticipating illogicality and delighting in its absurdity. Conversely, much of modern poetry, while often eschewing traditional rhyme schemes, may prioritize imagistic density and emotional resonance, demanding a more introspective and analytical engagement. These differing emphases aren’t arbitrary; they actively shape how a reader processes the text, influencing expectations about meaning, emotional impact, and even the very purpose of the poem itself. Consequently, evaluating poetry requires acknowledging that the criteria for success aren’t universal, but are instead contingent upon the specific conventions and priorities established by the chosen form.

Poetic style is significantly delineated by attributes such as rhyme scheme and the degree of seriousness conveyed, both of which powerfully influence how a reader approaches and understands a poem. The presence – or deliberate absence – of rhyme establishes expectations regarding structure and musicality, subtly signaling whether a poem will adhere to traditional forms or venture into more experimental territory. Simultaneously, a poem’s seriousness – ranging from lighthearted whimsy to profound contemplation – sets the emotional tone and guides the reader’s interpretive framework. A playful, unserious poem, even with complex rhyme, invites a different engagement than a somber, free-verse work, demonstrating how these stylistic choices actively shape not just what is communicated, but how it is received and ultimately, understood.

Poem authorship correlates with average imaginativeness, as demonstrated by the data.

Generative Poetics: An Algorithmic Approach

Large Language Models (LLMs), such as GPT-4, utilize deep learning architectures – specifically, the transformer network – to generate text. These models are pre-trained on massive datasets of text and code, enabling them to predict the probability of a sequence of words. The transformer architecture relies on self-attention mechanisms, allowing the model to weigh the importance of different words in the input sequence when generating output. Poetry generation leverages this capability by processing training data that includes diverse poetic forms and styles; the model then generates new text based on the learned statistical relationships between words and structures, effectively creating novel poetic content. Model scale, measured in parameters – GPT-4 reportedly has $1.76 \times 10^{12}$ parameters – is a primary driver of performance in these generative tasks.

The perceived quality of poetry generated by artificial intelligence is directly correlated to its successful emulation of established poetic characteristics. These characteristics include, but are not limited to, consistent metrical patterns, rhyme schemes, and structural elements such as stanzas and line breaks. Beyond formal attributes, the AI must also demonstrate an understanding of semantic coherence, thematic consistency, and the nuanced use of figurative language – including metaphors, similes, and personification – to convey meaning and evoke emotional responses. Successful integration of these formal and semantic elements results in outputs that more closely resemble human-authored poetry and are therefore judged as higher quality.

Rigorous evaluation of AI-generated text requires analysis of both stylistic and semantic features. Stylistic assessment includes metrics such as lexical diversity, sentence complexity – measured by average sentence length and the frequency of complex syntactic structures – and the presence of poetic devices like alliteration, assonance, and rhyme schemes. Semantic evaluation focuses on coherence, consistency, and the meaningfulness of the generated text; this can involve assessing topical relevance, entity recognition, and the logical relationships between statements. Quantitative methods, including perplexity scores and automated readability indices, are often combined with qualitative human evaluation to provide a comprehensive assessment of text quality, ensuring that generated content not only adheres to formal constraints but also conveys understandable and relevant information.

Perceived poem authorship significantly influences average liking, indicating a preference based on attributed origin.

Human Versus Machine: Distinguishing the Source

A comparative analysis was conducted to evaluate the quality of poetry generated by artificial intelligence. This methodology involved presenting human participants with a corpus of poetic texts, consisting of both human-authored and AI-generated pieces. These texts were then assessed based on pre-defined attributes, allowing for a direct comparison of stylistic and thematic elements. The resulting data facilitated an objective evaluation of the AI’s ability to produce poetry that aligns with established human creative standards, and formed the basis for subsequent statistical analysis of identification accuracy.

A study evaluating the ability of human evaluators to distinguish between poetry written by humans and artificial intelligence revealed a low accuracy rate of 45.8% for Czech native speakers. This result indicates that current AI models are capable of generating poetic text in the Czech language that closely resembles human-authored work. The relatively low identification rate suggests a significant overlap in stylistic and structural features between the two sources, posing a challenge to authorship attribution. This finding highlights the increasing sophistication of AI in creative text generation and its potential to produce outputs indistinguishable from human creativity, at least based on initial human evaluation.

Analysis of participant performance in authorship attribution revealed a statistically significant disparity based on poetic style. Participants correctly identified the author as human or machine 51.4% of the time when presented with nonsense poetry. However, identification accuracy decreased to 40.2% when evaluating modern poetry. This suggests that the stylistic characteristics of nonsense poetry – often characterized by deliberate illogicality and deviations from conventional grammar – provide more readily discernible cues for human authorship compared to the more nuanced and potentially ambiguous features of modern poetry.

Determining authorship – whether a text originates from a human or an artificial intelligence – presents significant challenges due to the increasing sophistication of AI in replicating human writing styles. Successful authorship attribution necessitates a detailed examination of stylistic nuances, moving beyond surface-level features to analyze patterns in vocabulary, syntax, and thematic development. Current AI models are capable of generating text that closely mimics the characteristics of human-authored works, particularly in genres with established conventions, thereby obscuring the distinctions traditionally used to identify the author. This complexity is amplified by the subjective nature of stylistic analysis and the potential for AI to learn and adapt its writing style based on large datasets of human text.

Perceived poem authorship correlates with average imaginativeness.

Beyond Authorship: Reconsidering Poetic Interpretation

Roland Barthes’ influential concept of the “Death of the Author” fundamentally shifts the locus of meaning from the creator to the recipient. This theory posits that a text, once released, becomes independent of its author’s intentions; its significance is not determined by what the author meant to convey, but rather by what the reader actively constructs through their unique engagement with the work. Consequently, a poem’s meaning is not a fixed entity waiting to be discovered, but a fluid and dynamic creation born from the interplay between the text itself and the individual reader’s experiences, biases, and interpretive frameworks. This decoupling of intention from interpretation allows for multiple, equally valid readings, freeing the text from the constraints of authorial control and embracing the subjective nature of understanding. The implications of this perspective extend beyond traditional literary criticism, particularly as it challenges established notions of originality and authority in the realm of creative expression.

The emergence of artificial intelligence capable of generating poetry fundamentally disrupts established concepts of authorship and creative ownership. Historically, poetic meaning has been tethered to the author’s biography, intent, and lived experience; however, AI-generated verse, born from algorithms and datasets, lacks this conventional provenance. This challenges the very notion of a singular, identifiable creator, prompting a reconsideration of where meaning truly originates. Instead of seeking authorial intent, interpretation shifts toward the interplay between the text itself, the reader’s subjective response, and the machine’s underlying processes – a dynamic where the ‘author’ becomes a collaborative entity, or perhaps, functionally absent, forcing a focus on the poem as a self-contained system of language and form.

The convergence of text, reader, and machine intelligence is reshaping the landscape of poetic engagement, particularly when considered through the lens of Czech verse. This dynamic interplay moves beyond simply generating poetry with artificial intelligence; it fosters a new form of collaborative interpretation. The reader, no longer solely focused on deciphering authorial intent, becomes an active participant in co-creating meaning with the algorithmic source. Consequently, analysis shifts from biographical or historical contexts to examining how a poem – regardless of its origin – functions as a self-contained system of language, eliciting unique responses from each individual. This approach allows for the exploration of previously unconsidered aesthetic qualities and challenges established canons, revealing how computational creativity can both mirror and diverge from traditional Czech poetic forms and themes, ultimately prompting a re-evaluation of what constitutes poetic value.

Average coherence scores indicate a correlation between text generation and attributed poem authorship.

The study illuminates a curious paradox. Aesthetic judgment, it seems, isn’t solely about the poem itself, but about perceived origin. Participants consistently struggled with authorship attribution, highlighting the increasing sophistication of large language models in mimicking human creativity. Yet, belief-or disbelief-in human authorship demonstrably alters enjoyment. As Vinton Cerf observed, “Any sufficiently advanced technology is indistinguishable from magic.” This resonates deeply; the ‘magic’ here isn’t simply AI’s ability to generate poetry, but its capacity to deceive regarding its origin, blurring the lines between human and artificial creation. Abstractions age, principles don’t; the principle is that perception shapes reality, even-perhaps especially-in art.

Further Queries

The difficulty in differentiating between human and machine poetry is not, itself, surprising. Mimicry is a low bar, and fluency a readily achievable illusion. The more pertinent finding – that perceived authorship alters aesthetic judgment – suggests the criteria for evaluation are not intrinsic to the text, but projected onto it. The ‘author’ becomes a container for expectation, a locus of meaning independent of the verse itself.

Future work should address the conditions under which this projection occurs. Does familiarity with an author’s style heighten the bias? Does the belief in human creativity exert a measurable neurological effect during aesthetic processing? The study’s use of nonsense verse was deliberate, minimizing semantic interference, but real-world assessment requires engagement with meaningful content-and all the attendant complexities.

Ultimately, the question is not whether a machine can write a poem, but whether the question itself is useful. The persistence of the ‘author’ as a conceptual necessity may reveal more about the evaluator than the evaluated. Perhaps the true subject of inquiry is not artificial intelligence, but artificial belief.

Original article: https://arxiv.org/pdf/2511.21629.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Subtle Art of Poetic Distinction

Generative Poetics: An Algorithmic Approach

Human Versus Machine: Distinguishing the Source

Beyond Authorship: Reconsidering Poetic Interpretation

Further Queries

See also: