Antiquity of the Śivasūtras: Phylogenetic Constraints on the Phonemic Inventory

Abstract

The Śivasūtras (Māheśvara Sūtras), foundational to Pāṇinian grammar in the Aṣṭādhyāyī, provide the phonemic inventory underlying Sanskrit derivations. This study investigates the phylogenetic antiquity of the phonetic strata encoded in these sūtras by triangulating traditional articulatory descriptions from the Pāṇinīya Śikṣā and related Prātiśākhyas with evolutionary evidence from hominin vocal tract anatomy and comparative primatology.

Chronological bounds are established separately for vowels (svaras) and consonants (vyañjanas). The Terminus Post Quem for quantal vowels (अ, इ, उ, ए, ओ) is anchored in open-tract vowel-like segments (VLSs) shared with Cercopithecoidea (~25 Ma), as documented in baboon vocalizations (Boë et al., 2017). Diphthongs ऐ and औ require combinatorial sequencing, an ability that predates the evolution of quantal vowels. Unique vowels ऋ and ऌ, described as non-contact (aspṛśya) articulations by the tongue root at the palate, appear incompatible with the reorganized vocal tract of anatomically modern Homo sapiens following laryngeal descent and air-sac loss. The absence of आ and the full 3×3×2 prosodic grid, alongside retention of dual ह distinctions, further supports a pre-Homo sapiens vocalic stratum. For consonants, the Terminus Post Quem aligns with the Homo–Pan divergence (5–7 Ma), as no speech-like consonants are attested in the Pan lineage (Demolin & Delvaux, 2006).

These independent lines of evidence constrain the composite phonemic inventory to a window between approximately 7 Ma and 430 ka, reflecting deep pre-modern human phonetic capacities preserved in the Śivasūtras.

Introduction

Within the Pāṇinian grammatical tradition, the Sanskrit lexicon is structured as a comprehensive system of derived forms. The Aṣṭādhyāyī—a foundational corpus of approximately 4,000 sūtras—executes these derivations by relying on the Śivasūtras (also called Māheśvara Sūtras) as its primary phonemic inventory. While the Śivasūtras themselves are not explicitly mentioned in the Vaidic texts, the specific phonemes (svaras and vyañjanas) and their precise articulatory values have been rigorously preserved through the orthoepic requirements of the Vaidic tradition.

Central to this linguistic architecture is the derivation of the lexicon from dhātus (verbal roots). Because these roots are comprised of specific phonemic sequences, the phonemes leave a distinct footprint on all subsequently derived words. This downstream morphological evolution, while significant, lies outside the primary scope of the present study. Instead, the present analysis focuses exclusively on the antiquity of the phonetic strata systematized within the Śivasūtras.

This study utilizes paleoanthropology-based dating predicated on the anatomical prerequisites for the articulatory phonetics explicitly detailed in the Śikṣā and Prātiśākhya. By triangulating the traditional phonetic descriptions found in these manuals—validated by the Vaidic oral record—with the evolutionary laryngeal anatomy of the hominin vocal tract and comparative primatology, we establish a Terminus Ante Quem and Terminus Post Quem for the emergence of the phonemic strata as they are known through ancient traditions and presently systematized within the Śivasūtras.

Scope & Methodology

The Śivasūtras comprise two distinct phonetic sections: vowels (svaras) and consonants (vyañjanas). Chronological bounds for the Śivasūtras are established by evaluating the following for each section:

Terminus Post Quem: Determined by the phylogenetic emergence of the anatomical and articulatory capacities required for the production of individual vowels (svaras) in the hominin lineage. For consonants (vyañjanas), the relevant baseline is the Homo–Pan divergence, as no evidence exists for the independent production of speech-like consonants prior to this split in the hominin lineage. The composite phonemic inventory of the Śivasūtras (vowels + consonants) requires the prior existence of anatomical capacity for both classes of phonemes.
Terminus Ante Quem: Inferred from comparison with articulatory descriptions in the Pāṇinīya Śikṣā and related texts, specifically:

(a) Unique svaras articulations are evaluated against vocal tract anatomy to determine the Terminus Ante Quem from paleoanthropological constraints on relevant anatomy.
(b) Svaras that are producible with the modern Homo sapiens vocal tract but are not included in the Śivasūtras inventory.

It is currently not possible to establish a Terminus Ante Quem for the consonantal inventory independent of Homo sapiens, due to the lack of sufficiently detailed and reliable vocal tract reconstructions for earlier Homo species (e.g., Homo erectus). Therefore, the Terminus Ante Quem for the overall (composite) phonemic inventory is evaluated primarily with reference to the vowel section.

The vowel–consonant distinction is not a prerequisite for linguistic structure itself—as evidenced by sign languages, which function as complete systems without this duality. However, when language is instantiated in the vocal–auditory modality, this distinction becomes relevant.

Accordingly, this analysis treats the Pāṇinīya Śikṣā and the Śivasūtras not as artifacts of a historical or grammatical language (Sanskrit), but as a sophisticated phonetic-phonological systematization of the fundamental building blocks of speech. The study views these articulatory categories as internally consistent phonetic constraints, independent of historical authorship.

Phonetic Framework & Evolutionary Inference

Vowels

The vowel inventory in the Śivasūtras comprises three categories—quantal vowels (अ, इ, उ, ए, ओ), derived vowels (ऐ, औ), and unique vowels (ऋ, ऌ)—whose production requires specific articulatory and acoustic mechanisms traceable through hominin and primate evolution. Phylogenetic mapping of these characters (presence/absence of vowel types and combinatory abilities) constrains the Terminus Post Quem for the vowel system’s compilation to the emergence of these capacities in the last common ancestor with relevant taxa.

Quantal vowels (अ, इ, उ, ए, ओ)

Quantal vowels exhibit stable formant patterns (F1/F2 dispersion-focalization) that minimize acoustic-perceptual confusion, as modelled by Stevens (1989) and quantified in human supralaryngeal vocal tract (SVT) simulations (Lieberman et al., 1969; Lieberman, 2012). Acoustic analyses of Guinea baboon (Papio papio) vocalizations reveal homologous vowel-like segments (VLSs) matching human [ɨ, æ, ɑ, ɔ, u], organized along two axes: horizontal (tongue advancement: [æ] ⇔ [u, ɔ]) and vertical (tongue height: [ɑ] ⇔ [ɨ]) (Boë et al., 2017). These VLS acoustic regions—[ɑ], [ɨ], [u], [æ], and [ɔ]—represent the biological-acoustic substrates for the Sanskrit vowels अ, इ, उ, ए, and ओ respectively, illustrating that the primary coordinates of the Śivasūtras vowel space are anchored in deep-time articulatory capacities. Baboons achieve quantal VLSs despite a high larynx, using tongue musculature (genioglossus, hyoglossus, styloglossus) comparable to early hominins. This demonstrates that a stable, quantal-like acoustic substrate can be generated without the descended larynx characteristic of Homo sapiens. Such vowel-like segments (VLSs) are not vowels in the linguistic sense, but represent the biological-acoustic precursors from which the five primary human quantal vowels later evolved (Boë et al., 2017).

Accordingly, the last common ancestor with Cercopithecoidea (~25 Ma) establishes a Terminus Post Quem based on phylogenetic availability of open-tract VLS capacity, a prerequisite for the emergence of the quantal vowel substrate underlying the Śivasūtras’ vowel inventory. This bound applies to articulatory–acoustic capability alone and does not date the appearance of human-like vowels or symbolic phonological systems, both of which could arise only after this articulatory–acoustic capacity had emerged.

Derived vowels (ऐ, औ)

ऐ and औ are traditionally analyzed as diphthongs, i.e., composite vowels involving a dynamic articulatory transition rather than a single steady-state target. They exhibit an acoustic transition between distinct articulatory positions (अ to इ and अ to उ respectively), requiring combinatorial sequencing. Comparative data show that the cognitive ability for such sequencing predates the evolution of quantal vowels: chimpanzees concatenate calls for context-specific meaning (Girard-Buttoz et al., 2025), and Campbell’s monkeys form complex sequences (Ouattara et al., 2009). While the cognitive capacity for sequencing is an ancient trait shared with other taxa, the specific manifestation of ऐ and औ as phonemes produced by combinatorial sequencing is biologically constrained by the prior availability of the underlying quantal vowel building blocks.

Unique vowels (ऋ, ऌ)

ऋ and ऌ are described in the Pāṇinīya Śikṣā and other Śikṣā texts as vowels. They are listed as distinct from contact (spṛśya) sounds, being characterized as aspṛśya (non-contact) in their mode of articulation. In contemporary Sanskrit recitation, ऋ and ऌ are typically realized as syllabic rhotic and lateral consonants, conventionally transcribed as [r̩] and [l̩], respectively; these realizations involve articulatory contact and thus differ fundamentally from the non-contact (aspṛśya) vowels described in the Śikṣā. In anatomically modern humans, the stable realization of ऋ and ऌ in this non-contact manner, as specified in the Śikṣā texts, does not appear to be feasible. The question of how to articulate ऋ and ऌ is ancient. It appears that inconsistent instructions are found in Pāṇinīya Śikṣā and Śaunakīyā Ṛkprātiśākhya–Śikṣā (Varma S., 1929, p. 7).

Rigveda–Prātiśākhya 1.41 states:
“ऋकारऌकारावथ षष्ठ ऊष्मा जिह्वामूलीयाः प्रथमश्च वर्गः ।”
The root of the tongue (jihvāmūla) is the articulator for ऋ, ऌ, ≍क (another sound defined in Sanskrit phonology that is not relevant for our discussion here) and the prathama varga (क, ख, ग, घ, and ङ).

Pāṇinīya Śikṣā Sūtra 17 states:
“स्युर्मूर्धन्या ऋटुरषा दन्त्या ऌतुलसाः स्मृताः ।”
The place (स्थान) of articulation is the palate for ऋ, टु (ट, ठ, ड, ढ, and ण) र and ष and teeth for ऌ, तु (त, थ, द, ध, and न) ल and स.

The articulation by the root of the tongue (jihvāmūla) at the palate (mūrdha) is anatomically restricted in Homo sapiens. This limitation arises from the phylogenetic descent of the larynx, which resulted in a reorganized vocal tract where the posterior third of the tongue, the jihvāmūla, is sequestered within the oropharynx rather than the oral cavity proper. Consequently, the decoupled vertical and horizontal tongue masses in modern humans cannot achieve the specific palatal-radical configuration that was possible in ancestors with a near-horizontal, intra-oral tongue.
Consequently, the phonological stratum reflected by the presence of ऋ and ऌ as vowels in the Śivasūtras is most plausibly associated with a pre–Homo sapiens articulatory regime.

It is generally accepted that lowering of the larynx and tongue coincides with loss of laryngeal air sacs. The sounds produced by early Homo before the descent of the tongue, especially vowels, may also have shown the influence of the laryngeal air sacs.

Given the absence of laryngeal air sacs in Homo sapiens, any attempt to approximate open-tract, vowel-like vocalizations produced by coexisting Homo taxa with air sacs would necessarily involve articulatory substitution rather than direct replication. Among the available articulatory strategies in the sapiens vocal tract, rhotic and rhoticized articulations provide the most plausible means of introducing controlled vibratory and spectral complexity while preserving vocalic continuity. This does not imply homology between air-sac–mediated calls and human rhotics, but rather reflects a constrained approximation imposed by anatomical differences. In contemporary Sanskrit recitation, ऋ and ऌ are typically realized as syllabic rhotic and lateral consonants as approximations under anatomical limitations.

Hyoid-based anatomical analysis indicates that the hyoids from Sima de los Huesos Homo were human-like (Martínez et al., 2008), modeling studies suggest that laryngeal air sacs were likely absent and the tongue had descended into the pharyngeal tube (de Boer, 2012), and fossil evidence from Arsuaga et al. (2014) provides a chronostratigraphic context dating these specimens to approximately 430 ka.

This proposed phonation of ऋ ऌ as vowels produced with higher tongue position and air sacs is empirically testable: articulatory–acoustic modelling of vowel-like phonation incorporating simulated laryngeal air sacs could evaluate whether the resulting vibratory and spectral properties are best approximated, in a human vocal tract lacking air sacs, by rhotic or rhoticized articulations.

Absence of आ

Traditional Sanskrit grammar treats the आ as दीर्घ अ. दीर्घ अ is interpreted to be implicit in अ in the Śivasūtras for the sake of brevity. However, Aṣṭādhyāyī 8.4.68 mentions the dīrgha of अ distinct from आ (Pāṇini). Śivasūtras mention ह twice. If brevity alone was the only governing principle, the duplicate inclusion of ह would be difficult to explain. The presence of two occurrences of ह thus suggests that the absence of आ may reflect considerations other than mere brevity.

Notably, आ appears in the Aṣṭādhyāyī but not in the Śivasūtras. This asymmetry is consistent with the possibility that the Śivasūtras predate the stable contrastive use of आ, whereas the Aṣṭādhyāyī postdates it. Accordingly, the absence of आ in the Śivasūtras is compatible with the hypothesis that at least some strata of this system derive from a period preceding anatomically modern human speech capacities.

While it is tempting to claim that producing [ɑː] is impossible without a descended larynx and tongue root, available evidence suggests otherwise. Vowel-like sounds acoustically similar to low vowels are observed both in non-human primates (Boë et al., 2017) and in early human vocal development. However, in vocal tracts lacking a descended larynx and an expanded pharyngeal cavity, the acoustic–perceptual contrast between central and low-back vowel targets is substantially compressed. Developmental acoustic studies show that infant vowel productions occupy a reduced and overlapping F1–F2 space, with weaker differentiation among vowel categories compared to adults (Vorperian & Kent, 2007). Under such anatomical and motor constraints, vowel qualities corresponding to mid-central [ə] and low-back [ɑ] would not yet function as stable, contrastive phonemic targets. From this perspective, the absence of आ in the Śivasūtras may reflect a linguistic stratum predating the stabilization of this contrast.

Absence of 3 × 3 × 2 Grid

Sanskrit vowels are traditionally described as exhibiting 18 varieties, arising from the interaction of three tonal categories (udātta, anudātta, svarita), three durations (hrasva, dīrgha, pluta), and oral versus nasal realization (3 × 3 × 2). These distinctions are not explicitly encoded as contrastive units in the Śivasūtras, which organize vowels primarily by phonological identity rather than prosodic or phonatory modulation. This omission may reflect either formal economy or the inheritance of an earlier linguistic stratum in which such fine-grained temporal, tonal, and nasal contrasts were not yet stabilized as phonemic oppositions. While these distinctions are fully realized and contrastive in Homo sapiens speech, their non-representation in the Śivasūtras is compatible with the hypothesis that the vowel inventory they abstract reflects a pre–Homo sapiens or pre-fully-modern articulatory stage.

Consonants

Two Types of ह

The Śivasūtras distinguish two articulatorily distinct forms of ह, which are auditorily indistinguishable in modern Homo sapiens. The preservation of this formal distinction is compatible with a pre–Homo sapiens origin, with subsequent loss of perceptual contrast through anatomical change or linguistic leveling.

Absence of ळ

ळ is absent from the Śivasūtras and the Dhātupāṭha, yet appears in later Vaidic usage, where it is accepted in sandhi but not in lexical stem formation. This distribution suggests that ळ emerged or was phonologized after the Śivasūtra-level consonantal inventory had stabilized.

Other consonants

While Boë et al. (2017) demonstrate that the vocalic proto-system—specifically vowel-like segments (VLSs)—is an ancient inheritance traceable to ~25 Ma, Demolin and Delvaux (2006) identify a distinct biological boundary for consonantal production. In their study of bonobos (Pan paniscus), they report no speech-class vowels or consonants, attributing this absence to limited neural control and the lack of dynamically configurable, non-uniform vocal-tract shaping required for controlled strictures. Boë et al. (2017) do not contest this conclusion for consonants and restrict their claims to VLSs alone.

Accordingly, the anatomical Terminus Post Quem for the consonantal section of the Śivasūtras cannot predate the Homo–Pan divergence (5–7 Ma). The Śivasūtras thus preserve a phylogenetic stratification: the vowel section reflects an ancient open-tract capability shared with Pan, while the consonant section encodes the emergence of obstructed-tract capacities characteristic of the post-divergence Hominini lineage.

Conclusion

The Śivasūtras themselves may be of later compilation, but the phonetic strata from which their vowel and consonant inventories are drawn are constrained to a window between approximately 7 Ma and 430 ka. This conclusion rests on independent vowel- and consonant-based evidence:

The treatment of ऋ and ऌ as aspṛśya svaras, articulations no longer attainable in Homo sapiens
The absence of आ, a feature emerging with Homo sapiens vocal tract configuration
The absence of the later-expanded set of 18 svara varieties characteristic of Homo sapiens prosody
The retention of a dual ह distinction lost in Homo sapiens
The biological absence of consonantal production capabilities in the Pan lineage

Together, these delimit the vowel system to a deep pre–Homo sapiens stratum and the consonant system to a post–Homo–Pan divergence horizon.

References

Arsuaga, J. L., et al. (2014). Neandertal roots: Cranial and chronological evidence from Sima de los Huesos. Science, 344(6190), 1358–1363. https://pubmed.ncbi.nlm.nih.gov/24948730/
Benton, M. J., & Donoghue, P. C. J. (2007). Paleontological evidence to date the tree of life. Molecular Biology and Evolution, 24(1), 26–53. https://doi.org/10.1093/molbev/msl150
Boë, L. J., et al. (2017). Evidence of a vocalic proto-system in the baboon (Papio papio) suggests pre-hominin speech precursors. PLOS ONE, 12(1), e0169321. https://doi.org/10.1371/journal.pone.0169321
Boë, L. J., et al. (2019). Which way to the dawn of speech?: Reanalyzing half a century of debates and data in light of speech science. Science Advances, 5(12), eaaw3916. https://www.science.org/doi/10.1126/sciadv.aaw3916
de Boer, B. (2012). Loss of air sacs improved hominin speech abilities. Journal of Human Evolution, 62(1), 1–6. https://doi.org/10.1016/j.jhevol.2011.07.007
Demolin, D., & Delvaux, V. (2006). A comparison of the articulatory parameters involved in the production of sound of bonobos and modern humans. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution of language: Proceedings of the 6th International Conference (EVOLANG6) (pp. 67–74). World Scientific. https://www.worldscientific.com/doi/abs/10.1142/9789812774262_0009
Girard-Buttoz, C., et al. (2025). Versatile use of chimpanzee call combinations promotes meaning expansion. Science Advances, 11(3), eadq2879. https://doi.org/10.1126/sciadv.adq2879
Lieberman, P., et al. (1969). Vocal tract limitations on the vowel repertoires of rhesus monkey and other nonhuman primates. Science, 164(3884), 1185–1187. https://www.science.org/doi/10.1126/science.164.3884.1185
Lieberman, P. (2012). Vocal tract anatomy and the origin of speech. In The Oxford handbook of language evolution (pp. 208–220). Oxford University Press.
Manser, M. B. (2001). The acoustic structure of suricates’ alarm calls varies with predator type and the level of response urgency. Proceedings of the Royal Society B: Biological Sciences, 268(1483), 2315–2324. https://doi.org/10.1098/rspb.2001.1773
Martínez, I., et al. (2008). Human hyoid bones from the middle Pleistocene site of the Sima de los Huesos (Sierra de Atapuerca, Spain). Journal of Human Evolution, 54(1), 118–124. https://www.sciencedirect.com/science/article/abs/pii/S0047248407001960
Morton, E. S. (1977). On the occurrence and significance of motivation-structural rules in some bird and mammal sounds. The American Naturalist, 111(981), 855–869. https://doi.org/10.1086/283219
Ouattara, K., et al. (2009). Campbell’s monkeys concatenate vocalizations into context-specific call sequences. Proceedings of the National Academy of Sciences, 106(51), 22026–22031. https://doi.org/10.1073/pnas.0908118106
Pāṇini Aṣṭādhyāyī. Classical text, no modern edition specified.
Pāṇinīya Śikṣā. sūtras 4, 17–18. Classical text, no modern edition specified.
Payne, R. S., & McVay, S. (1971). Songs of humpback whales. Science, 173(3997), 585–597. https://doi.org/10.1126/science.173.3997.585
Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics, 17(1–2), 3–45. https://doi.org/10.1016/S0095-4470(19)31520-7
Varma, S. (1929). Critical studies in the phonetic observations of Indian grammarians. Royal Asiatic Society. https://ignca.gov.in/Asi_data/38481.pdf
Vorperian, H. K., & Kent, R. D. (2007). Sound production by the vocal tracts of infants and children: A review and synthesis. In B. J. Peter & R. D. Kent (Eds.), The handbook of phonetic sciences (2nd ed., pp. 191–219). Blackwell.

Origin of Speech

Explorer

paper version soft