Assertive version

A Pre-Homo sapiens Vowel Inventory Preserved in the Śivasūtras: Evidence from Comparative Primatology and Hominin Vocal Tract Evolution

Abstract

The Śivasūtras preserve a phonemic inventory that cannot be fully explained by anatomically modern human vocal tract physiology. We propose a phylogenetically grounded hypothesis: The Śivasūtras phonemic system (especially the non-contact articulations of ऋ and ऌ, the absence of आ, the retention of dual ह, and the lack of the full 3×3×2 prosodic grid) reflects a pre-Homo sapiens articulatory regime that was partially lost after laryngeal descent and air-sac reduction. This hypothesis is grounded in (i) articulatory descriptions in the Pāṇinīya Śikṣā and Ṛgveda–Prātiśākhya that are incompatible with modern human anatomy, (ii) fossil and modelling data on hominin vocal tract evolution, and (iii) acoustic evidence from cercopithecoid vowel-like segments (Boë et al. 2017). The proposal makes several falsifiable predictions, including the outcome of articulatory-acoustic modelling of air-sac-mediated vowels and future vocal-tract reconstructions of early Homo.

Introduction

The Śivasūtras (also known as Māheśvara Sūtras), the foundational phonemic inventory of Pāṇinian grammar in the Aṣṭādhyāyī, encode a vowel system that cannot be fully accounted for by the articulatory capacities of anatomically modern Homo sapiens. We propose that this inventory preserves a pre-Homo sapiens phonetic regime, partly lost after laryngeal descent and laryngeal air-sac reduction, and maintained through an unbroken tradition of orthoepic precision in the Indian grammatical lineage.

In the Pāṇinian system, the lexicon derives systematically from verbal roots (dhātus) via phonological rules that rely on the Śivasūtras as the irreducible set of contrastive elements (svaras and vyañjanas). While downstream morphological derivations from these roots are well understood, the antiquity of the phonetic strata themselves has remained unexamined through the lens of hominin vocal tract evolution.

This study integrates traditional articulatory descriptions from the Pāṇinīya Śikṣā and related Prātiśākhyas—validated by the Vedic oral record—with comparative primatology and paleoanthropological evidence on vocal tract reconfiguration. Three independent lines converge on the same conclusion:

The non-contact (aspṛśya) articulations of ऋ and ऌ by the tongue root (jihvāmūla) at the palate (mūrdha), as described in the Śikṣā texts, are incompatible with the reorganized oropharynx and decoupled tongue masses of modern humans.
Acoustic vowel-like segments (VLSs) in cercopithecoids (Boë et al., 2017) provide the phylogenetic substrate for the quantal vowels, while sequencing abilities in extant primates prefigure the diphthongs ऐ and औ.
The absence of आ, the retention of dual ह distinctions, and the lack of the full 3×3×2 prosodic grid indicate a system predating the stabilization of contrasts enabled by full laryngeal descent and pharyngeal expansion.

These observations predict specific acoustic signatures that can be tested through articulatory-acoustic modelling and future reconstructions of early Homo vocal tracts.

This study does not argue for the presence of language, but establishes anatomically derived terminus ante quem and terminus post quem for articulatory–acoustic capacities later formalized and preserved in phonemic systems.

Scope & Methodology

The Śivasūtras comprise two distinct phonetic sections: vowels (svaras) and consonants (vyañjanas). Chronological bounds for the Śivasūtras are established by evaluating the following for each section:

Terminus post quem: Determined by the phylogenetic emergence of the anatomical and articulatory capacities required for the production of individual vowels (svaras) in the hominin lineage. For consonants (vyañjanas), the relevant baseline is the Homo–Pan divergence, as no evidence exists for the independent production of speech-like consonants prior to this split in the hominin lineage. The composite phonemic inventory of the Śivasūtras (vowels + consonants) requires the prior existence of anatomical capacity for both classes of phonemes.
Terminus ante quem: Inferred from comparison with articulatory descriptions in the Pāṇinīya Śikṣā and related texts, specifically:

(a) Unique svaras articulations are evaluated against vocal tract anatomy to determine the terminus ante quem from paleoanthropological constraints on relevant anatomy.
(b) Svaras that are producible with the modern Homo sapiens vocal tract but are not included in the Śivasūtras inventory.

It is currently not possible to establish a terminus ante quem for the consonantal inventory independent of Homo sapiens, due to the lack of sufficiently detailed and reliable vocal tract reconstructions for earlier Homo species (e.g., Homo erectus). Therefore, the terminus ante quem for the overall (composite) phonemic inventory is evaluated primarily with reference to the vowel section.

The vowel–consonant distinction is not a prerequisite for linguistic structure itself—as evidenced by sign languages, which function as complete systems without this duality. However, when language is instantiated in the vocal–auditory modality, this distinction becomes relevant.

Accordingly, this analysis treats the Pāṇinīya Śikṣā and the Śivasūtras not as artifacts of a historical or grammatical language (Sanskrit), but as a sophisticated phonetic-phonological systematization of the fundamental building blocks of speech. The study views these articulatory categories as internally consistent phonetic constraints, independent of historical authorship.

Phonetic Framework & Evolutionary Inference

Vowels

The vowel inventory in the Śivasūtras comprises three categories—quantal vowels (अ, इ, उ, ए, ओ), derived vowels (ऐ, औ), and unique vowels (ऋ, ऌ)—whose production requires specific articulatory and acoustic mechanisms traceable through hominin and primate evolution. Phylogenetic mapping of these characters (presence/absence of vowel types and combinatory abilities) constrains the terminus post quem for the vowel system’s compilation to the emergence of these capacities in the last common ancestor with relevant taxa.

Quantal vowels (अ, इ, उ, ए, ओ)

Quantal vowels exhibit stable formant patterns (F1/F2 dispersion-focalization) that minimize acoustic-perceptual confusion, as modelled by Stevens (1989) and quantified in human supralaryngeal vocal tract (SVT) simulations (Lieberman et al., 1969; Lieberman, 2012). Acoustic analyses of Guinea baboon (Papio papio) vocalizations reveal homologous vowel-like segments (VLSs) matching human [ɨ, æ, ɑ, ɔ, u], organized along two axes: horizontal (tongue advancement: [æ] ⇔ [u, ɔ]) and vertical (tongue height: [ɑ] ⇔ [ɨ]) (Boë et al., 2017). These VLS acoustic regions—[ɑ], [ɨ], [u], [æ], and [ɔ]—represent the biological-acoustic substrates for the Sanskrit vowels अ, इ, उ, ए, and ओ respectively, illustrating that the primary coordinates of the Śivasūtras vowel space are anchored in deep-time articulatory capacities. Baboons achieve quantal VLSs despite a high larynx, using tongue musculature (genioglossus, hyoglossus, styloglossus) comparable to early hominins. This demonstrates that a stable, quantal-like acoustic substrate can be generated without the descended larynx characteristic of Homo sapiens. Such vowel-like segments (VLSs) are not vowels in the linguistic sense, but represent the biological-acoustic precursors from which the five primary human quantal vowels later evolved (Boë et al., 2017).

Accordingly, the last common ancestor with Cercopithecoidea (~25 Ma) establishes a terminus post quem based on phylogenetic availability of open-tract VLS capacity, a prerequisite for the emergence of the quantal vowel substrate underlying the Śivasūtras’ vowel inventory. This bound applies to articulatory–acoustic capability alone and does not date the appearance of human-like vowels or symbolic phonological systems, both of which could arise only after this articulatory–acoustic capacity had emerged.

Derived vowels (ऐ, औ)

ऐ and औ are traditionally analyzed as diphthongs, i.e., composite vowels involving a dynamic articulatory transition rather than a single steady-state target. They exhibit an acoustic transition between distinct articulatory positions (अ to इ and अ to उ respectively), requiring combinatorial sequencing. Comparative data show that the cognitive ability for such sequencing predates the evolution of quantal vowels: chimpanzees concatenate calls for context-specific meaning (Girard-Buttoz et al., 2025), and Campbell’s monkeys form complex sequences (Ouattara et al., 2009). While the cognitive capacity for sequencing is an ancient trait shared with other taxa, the specific manifestation of ऐ and औ as phonemes produced by combinatorial sequencing is biologically constrained by the prior availability of the underlying quantal vowel building blocks.

Unique vowels (ऋ, ऌ)

ऋ and ऌ are described in the Pāṇinīya Śikṣā and other Śikṣā texts as vowels. They are listed as distinct from contact (spṛśya) sounds, being characterized as aspṛśya (non-contact) in their mode of articulation. In contemporary Sanskrit recitation, ऋ and ऌ are typically realized as syllabic rhotic and lateral consonants, conventionally transcribed as [r̩] and [l̩], respectively; these realizations involve articulatory contact and thus differ fundamentally from the non-contact (aspṛśya) vowels described in the Śikṣā. In anatomically modern humans, the stable realization of ऋ and ऌ in this non-contact manner, as specified in the Śikṣā texts, is anatomically impossible. The question of how to articulate ऋ and ऌ is ancient. It appears that inconsistent instructions are found in Pāṇinīya Śikṣā and Śaunakīyā Ṛkprātiśākhya–Śikṣā (Varma S., 1929, p. 7).

Ṛgveda–Prātiśākhya 1.41 states:
“ऋकारऌकारावथ षष्ठ ऊष्मा जिह्वामूलीयाः प्रथमश्च वर्गः ।”
The root of the tongue (jihvāmūla) is the articulator for ऋ, ऌ, ≍क (another sound defined in Sanskrit phonology that is not relevant for our discussion here) and the prathama varga (क, ख, ग, घ, and ङ).

Pāṇinīya Śikṣā Sūtra 17 states:
“स्युर्मूर्धन्या ऋटुरषा दन्त्या ऌतुलसाः स्मृताः ।”
The place (स्थान) of articulation is the palate for ऋ, टु (ट, ठ, ड, ढ, and ण) र and ष and teeth for ऌ, तु (त, थ, द, ध, and न) ल and स.

The articulation by the root of the tongue (jihvāmūla) at the palate (mūrdha) is anatomically restricted in Homo sapiens. This limitation arises from the phylogenetic descent of the larynx, which resulted in a reorganized vocal tract where the posterior third of the tongue, the jihvāmūla, is sequestered within the oropharynx rather than the oral cavity proper. Consequently, the decoupled vertical and horizontal tongue masses in modern humans cannot achieve the specific palatal-radical configuration that was possible in ancestors with a near-horizontal, intra-oral tongue.
Consequently, the phonological stratum reflected by the presence of ऋ and ऌ as vowels in the Śivasūtras belongs to a pre–Homo sapiens articulatory regime.

It is generally accepted that lowering of the larynx and tongue coincides with loss of laryngeal air sacs. The sounds produced by early Homo before the descent of the tongue, especially vowels, should have been influenced by the laryngeal air sacs.

Given the absence of laryngeal air sacs in Homo sapiens, any attempt to approximate open-tract, vowel-like vocalizations produced by coexisting Homo taxa with air sacs would necessarily involve articulatory substitution rather than direct replication. Among the available articulatory strategies in the sapiens vocal tract, rhotic and rhoticized articulations provide the most plausible means of introducing controlled vibratory and spectral complexity while preserving vocalic continuity. This does not imply homology between air-sac–mediated calls and human rhotics, but rather reflects a constrained approximation imposed by anatomical differences. In contemporary Sanskrit recitation, ऋ and ऌ are typically realized as syllabic rhotic and lateral consonants as approximations under anatomical limitations.

Hyoid-based anatomical analysis indicates that the hyoids from Sima de los Huesos Homo were human-like (Martínez et al., 2008), modeling studies suggest that laryngeal air sacs were likely absent and the tongue had descended into the pharyngeal tube (de Boer, 2012), and fossil evidence from Arsuaga et al. (2014) provides a chronostratigraphic context dating these specimens to approximately 430 ka.

This proposed phonation of ऋ ऌ as vowels produced with higher tongue position and air sacs is empirically testable: articulatory–acoustic modelling of vowel-like phonation incorporating simulated laryngeal air sacs could evaluate whether the resulting vibratory and spectral properties are best approximated, in a human vocal tract lacking air sacs, by rhotic or rhoticized articulations.

Absence of आ

Traditional Sanskrit grammar treats the आ as दीर्घ अ. दीर्घ अ is interpreted to be implicit in अ in the Śivasūtras for the sake of brevity. However, Aṣṭādhyāyī 8.4.68 mentions the dīrgha of अ distinct from आ (Pāṇini). Śivasūtras mention ह twice. If brevity alone was the only governing principle, the duplicate inclusion of ह would be difficult to explain. The presence of two occurrences of ह thus suggests that the absence of आ may reflect considerations other than mere brevity.

Notably, आ appears in the Aṣṭādhyāyī but not in the Śivasūtras. This asymmetry is consistent with the possibility that the Śivasūtras predate the stable contrastive use of आ, whereas the Aṣṭādhyāyī postdates it. Accordingly, the absence of आ in the Śivasūtras is compatible with the hypothesis that at least some strata of this system derive from a period preceding anatomically modern human speech capacities.

It could be objected
that producing [ɑː] is impossible without a descended larynx and tongue root, available evidence suggests otherwise. Vowel-like sounds acoustically similar to low vowels are observed both in non-human primates (Boë et al., 2017) and in early human vocal development. However, in vocal tracts lacking a descended larynx and an expanded pharyngeal cavity, the acoustic–perceptual contrast between central and low-back vowel targets is substantially compressed. Developmental acoustic studies show that infant vowel productions occupy a reduced and overlapping F1–F2 space, with weaker differentiation among vowel categories compared to adults (Vorperian & Kent, 2007). Under such anatomical and motor constraints, vowel qualities corresponding to mid-central [ə] and low-back [ɑ] would not yet function as stable, contrastive phonemic targets. From this perspective, the absence of आ in the Śivasūtras may reflect a linguistic stratum predating the stabilization of this contrast.

Absence of 3 × 3 × 2 Grid

Sanskrit vowels are traditionally described as exhibiting 18 varieties, arising from the interaction of three tonal categories (udātta, anudātta, svarita), three durations (hrasva, dīrgha, pluta), and oral versus nasal realization (3 × 3 × 2). These distinctions are not explicitly encoded as contrastive units in the Śivasūtras, which organize vowels primarily by phonological identity rather than prosodic or phonatory modulation. This omission may reflect either formal economy or the inheritance of an earlier linguistic stratum in which such fine-grained temporal, tonal, and nasal contrasts were not yet stabilized as phonemic oppositions. While these distinctions are fully realized and contrastive in Homo sapiens speech, their non-representation in the Śivasūtras supports the hypothesis that the vowel inventory they abstract reflects a pre–Homo sapiens or pre-fully-modern articulatory stage.

Consonants

Two Types of ह

The Śivasūtras distinguish two articulatorily distinct forms of ह, which are auditorily indistinguishable in modern Homo sapiens. The preservation of this formal distinction indicates a pre–Homo sapiens origin, with subsequent loss of perceptual contrast through anatomical change or linguistic leveling.

Absence of ळ

ळ is absent from the Śivasūtras and the Dhātupāṭha, yet appears in later Vaidic usage, where it is accepted in sandhi but not in lexical stem formation. This distribution suggests that ळ emerged or was phonologized after the Śivasūtra-level consonantal inventory had stabilized.

Other consonants

While Boë et al. (2017) demonstrate that the vocalic proto-system—specifically vowel-like segments (VLSs)—is an ancient inheritance traceable to ~25 Ma, Demolin and Delvaux (2006) identify a distinct biological boundary for consonantal production. In their study of bonobos (Pan paniscus), they report no speech-class vowels or consonants, attributing this absence to limited neural control and the lack of dynamically configurable, non-uniform vocal-tract shaping required for controlled strictures. Boë et al. (2017) do not contest this conclusion for consonants and restrict their claims to VLSs alone.

Accordingly, the anatomical terminus post quem for the consonantal section of the Śivasūtras cannot predate the Homo–Pan divergence (5–7 Ma). The Śivasūtras thus preserve a phylogenetic stratification: the vowel section reflects an ancient open-tract capability shared with Pan, while the consonant section encodes the emergence of obstructed-tract capacities characteristic of the post-divergence Hominini lineage.

Conclusion

The Śivasūtras preserve a phonemic inventory that reflects articulatory capacities predating the vocal tract reconfiguration of anatomically modern Homo sapiens. The vowel system—particularly the non-contact (aspṛśya) realizations of ऋ and ऌ, the absence of आ as a contrastive low-back long vowel, the retention of dual ह distinctions, and the omission of the full 3×3×2 prosodic grid—cannot be reconciled with the descended larynx, expanded pharynx, and air-sac loss characteristic of sapiens. Instead, these features converge on a pre-sapiens phonetic regime, anchored phylogenetically by open-track vowels before descent of larynx (430 ka) and obstructed-tract capacities emerging after the Homo–Pan divergence (~5–7 Ma).

Terminus ante quem for the phonemic inventory preserved in the Śivasūtras is 430 ka and terminus post quem is 7 Ma.

Independent lines of evidence support this conclusion:

The tongue-root articulation of ऋ and ऌ at the palate, as described in the Pāṇinīya Śikṣā and Ṛgveda-Prātiśākhya, requires an intra-oral tongue configuration incompatible with modern human anatomy.
Acoustic substrates for the quantal vowels appear in cercopithecoid vocalizations, while combinatorial sequencing predates quantal stability in extant primates.
The absence of contrastive आ and the prosodic grid indicates a stage before stabilization of low-back distinctions and fine temporal/tonal modulation enabled by full descent.
Retention of dual ह distinctions, lost in sapiens, points to perceptual contrasts that Homo Sapiens are no longer able to perceive.
Consonantal production boundaries post-date the Homo–Pan split, as no speech-like consonants are attested in the Pan lineage. This gives terminus ante quem and places the regime in Homo lineage and not earlier.

These constraints place the preserved phonetic stratum between approximately 7 Ma and 430 ka. The proposal generates falsifiable acoustic predictions—most notably that air-sac-mediated vowels would produce rhotic-like, best approximated today by syllabic rhotics and laterals—which can be tested through articulatory-acoustic modelling and future reconstructions of early Homo vocal tracts.

References

Arsuaga, J. L., et al. (2014). Neandertal roots: Cranial and chronological evidence from Sima de los Huesos. Science, 344(6190), 1358–1363. https://pubmed.ncbi.nlm.nih.gov/24948730/
Benton, M. J., & Donoghue, P. C. J. (2007). Paleontological evidence to date the tree of life. Molecular Biology and Evolution, 24(1), 26–53. https://doi.org/10.1093/molbev/msl150
Boë, L. J., et al. (2017). Evidence of a vocalic proto-system in the baboon (Papio papio) suggests pre-hominin speech precursors. PLOS ONE, 12(1), e0169321. https://doi.org/10.1371/journal.pone.0169321
Boë, L. J., et al. (2019). Which way to the dawn of speech?: Reanalyzing half a century of debates and data in light of speech science. Science Advances, 5(12), eaaw3916. https://www.science.org/doi/10.1126/sciadv.aaw3916
de Boer, B. (2012). Loss of air sacs improved hominin speech abilities. Journal of Human Evolution, 62(1), 1–6. https://doi.org/10.1016/j.jhevol.2011.07.007
Demolin, D., & Delvaux, V. (2006). A comparison of the articulatory parameters involved in the production of sound of bonobos and modern humans. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution of language: Proceedings of the 6th International Conference (EVOLANG6) (pp. 67–74). World Scientific. https://www.worldscientific.com/doi/abs/10.1142/9789812774262_0009
Girard-Buttoz, C., et al. (2025). Versatile use of chimpanzee call combinations promotes meaning expansion. Science Advances, 11(3), eadq2879. https://doi.org/10.1126/sciadv.adq2879
Lieberman, P., et al. (1969). Vocal tract limitations on the vowel repertoires of rhesus monkey and other nonhuman primates. Science, 164(3884), 1185–1187. https://www.science.org/doi/10.1126/science.164.3884.1185
Lieberman, P. (2012). Vocal tract anatomy and the origin of speech. In The Oxford handbook of language evolution (pp. 208–220). Oxford University Press.
Manser, M. B. (2001). The acoustic structure of suricates’ alarm calls varies with predator type and the level of response urgency. Proceedings of the Royal Society B: Biological Sciences, 268(1483), 2315–2324. https://doi.org/10.1098/rspb.2001.1773
Martínez, I., et al. (2008). Human hyoid bones from the middle Pleistocene site of the Sima de los Huesos (Sierra de Atapuerca, Spain). Journal of Human Evolution, 54(1), 118–124. https://www.sciencedirect.com/science/article/abs/pii/S0047248407001960
Morton, E. S. (1977). On the occurrence and significance of motivation-structural rules in some bird and mammal sounds. The American Naturalist, 111(981), 855–869. https://doi.org/10.1086/283219
Ouattara, K., et al. (2009). Campbell’s monkeys concatenate vocalizations into context-specific call sequences. Proceedings of the National Academy of Sciences, 106(51), 22026–22031. https://doi.org/10.1073/pnas.0908118106
Pāṇini Aṣṭādhyāyī. Classical text, no modern edition specified.
Pāṇinīya Śikṣā. sūtras 4, 17–18. Classical text, no modern edition specified.
Payne, R. S., & McVay, S. (1971). Songs of humpback whales. Science, 173(3997), 585–597. https://doi.org/10.1126/science.173.3997.585
Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics, 17(1–2), 3–45. https://doi.org/10.1016/S0095-4470(19)31520-7
Varma, S. (1929). Critical studies in the phonetic observations of Indian grammarians. Royal Asiatic Society. https://ignca.gov.in/Asi_data/38481.pdf
Vorperian, H. K., & Kent, R. D. (2007). Sound production by the vocal tracts of infants and children: A review and synthesis. In B. J. Peter & R. D. Kent (Eds.), The handbook of phonetic sciences (2nd ed., pp. 191–219). Blackwell.

Origin of Speech

Explorer

V3