Limited Morphological Complexity
The current system supports only analytic or agglutinative morphology (affixes or space-split words), excluding heavily fusional (e.g., Arabic root-and-pattern) or polysynthetic systems. This restricts the naturalness and diversity of generated ConLangs.
English-Parasitic Outputs
The generated ConLangs often retain properties of the English source language, such as do-support or auxiliary 'be' in passives, indicating the LLMs struggle to fully abstract away from the source language's structure.
Incomplete Grammatical Feature Coverage
The morphosyntax module covers only a subset of grammatical features, omitting many found in natural languages (e.g., diminutives, augmentatives, many gender systems beyond masculine/feminine/neuter). Gender assignment for common nouns also remains problematic.
Bias Towards Affixal, Agglutinative Morphology
The system is biased towards a strictly affixal, single-exponent, agglutinative strategy, with no support for paradigmatic variation, which is common in natural languages.
Lack of Integrated Phonological Rules
Beyond phonotactics, a system for creating plausible phonological rule sets that alter word shapes (e.g., assimilation, sound shifts) is still under development and not yet integrated, limiting the realism of language evolution.
LLM Overgeneration in Phonotactics
The LLMs sometimes produce phonotactically odd or unlikely combinations of sounds, indicating a lack of full understanding of natural phonological constraints beyond simple templates.
The generated orthographies are generally simple, using one-to-one phoneme-grapheme mappings, unlike many older, established orthographies which show significant divergence from direct phonetic representation due to historical changes.
Struggle with Typologically Unusual Features
LLMs perform poorly when dealing with rare or typologically uncommon morphosyntactic configurations and unusual word orders (e.g., Fijian VOS, Mizo OSV), suggesting a bias towards more frequent patterns in their training data.
Negative Impact on Low-Resource Translation
LLM-generated annotations paradoxically led to *lower* BLEU scores for English-Ainu translation compared to unannotated text. This indicates the current annotation system is not yet adequate for aiding low-resource language translation, despite promising results with human-generated annotations.
Lemmatization Difficulties in Analytic Languages
High-performing LLMs struggled to correctly lemmatize words in analytic languages (those with limited inflectional morphology), often leaving words inflected despite instructions, affecting overall correctness.
Wiktionary Data Inconsistency for Evaluation
The inconsistency in phonetic transcription levels within Wiktionary data for different languages makes robust evaluation of phonotactic models challenging.