A Computational Dialectology Approach to Mapping Bidayuhic Varieties in Tayan Hulu Using Gabmap

Authors

  • Dedy Ari Asfar Tanjungpura University
  • Syarifah Lubna National Research and Innovation Agency (BRIN)
  • Irmayani Abdulmalik National Research and Innovation Agency (BRIN)
  • Wiwin Erni Siti Nurlina National Research and Innovation Agency (BRIN)
  • Edi Setiyanto National Research and Innovation Agency (BRIN)
  • Sutarsih National Research and Innovation Agency (BRIN)
  • Yusup Irawan National Research and Innovation Agency (BRIN)
  • Binar Kurniasari Febrianti National Research and Innovation Agency (BRIN)
  • Yeni Yulianti National Research and Innovation Agency (BRIN)
  • Sarwo Ferdi Wibowo National Research and Innovation Agency (BRIN)
  • Febyasti Davela Ramadini National Research and Innovation Agency (BRIN)
  • Ajeng Rahayu Tjaraka National Research and Innovation Agency (BRIN)
  • Prima Duantika Balai Bahasa Provinsi Kalimantan Barat

DOI:

https://doi.org/10.17507/jltr.1702.09

Keywords:

Bidayuhic, Austronesian language, dialectometry, Gabmap, computational dialectology

Abstract

This study examines the linguistic variation of the Bidayuhic language in Tayan Hulu, West Kalimantan, Indonesia, through a computational dialectological approach using Gabmap. This study applies Levenshtein Distance to measure lexical and phonological differences in six observation sites, analyzing 491 lexical items. The findings show that the Bidayuhic language forms a linguistic continuum, where dialectal variation is not entirely aligned with geographical boundaries. Instead, lexical and phonological differences are influenced by language contact, social mobility, and cultural interaction. This study identifies the merger of the Proto-Malayo-Polynesian (PMP) phonemes R and l into /r/, /ɣ/, and /h/, reflecting phonological innovations in the Bidayuhic language. Furthermore, ablaut in verb morphology is observed, distinguishing between transitive and intransitive verb forms. Cluster analysis via Multidimensional Scaling (MDS) and probabilistic clustering revealed two main groups, confirming that variation is gradual rather than regionally segmented. Despite adding 0.8 probabilistic disturbances, the clustering remained stable, validating the effectiveness of Gabmap in dialect classification. These results emphasize that Bidayuhic variation is shaped more by sociolinguistic interactions than geographical factors. This study highlights the role of Gabmap in linguistic mapping, offering a methodological model for mapping local languages in Indonesia.

Author Biographies

Dedy Ari Asfar, Tanjungpura University

Teachers’ Training and Education Faculty

Syarifah Lubna, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

Irmayani Abdulmalik, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

Wiwin Erni Siti Nurlina, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

Edi Setiyanto, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

Sutarsih, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

Yusup Irawan, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

Binar Kurniasari Febrianti, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

Yeni Yulianti, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

Sarwo Ferdi Wibowo, National Research and Innovation Agency (BRIN)

Manuscript, Literature, and Oral Tradition Research Center

Febyasti Davela Ramadini, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

Ajeng Rahayu Tjaraka, National Research and Innovation Agency (BRIN)

Language, Literature, and Community Research Center

References

Asfar, D. A. (2014). Klasifikasi bahasa Dayak Pruwan sebagai bahasa Bidayuhik [Classification of Pruwan Dayak as a Bidayuhik language]. Kandai, 10(2), 138–152. https://doi.org/10.26499/jk.v10i2.318

Asfar, D. A. (2015). Bahasa Ribun: Refleks fonem Proto-Melayu Polinesia dalam bahasa Ribun [Ribun language: Polynesian Proto-Malay phoneme reflexes in Ribun languages]. Top Indonesia.

Asfar, D. A. (2016). Kearifan lokal dan ciri kebahasaan teks naratif masyarakat Iban [Local wisdom and linguistic features of Iban narrative texts]. Litera, 15(2), 366-378. https://doi.org/10.21831/ltr.v15i2.11835

Beier, C., & Epps, P. (2020). Reflections on fieldwork: A view from Amazonia. Language Documentation and Conservation, Special Issue, (15), 321–329.

Bonilla, J. E. (2023). Superdialects, Dialects, and Subdialects of Colombian Spanish. Lexis (Peru), 47(2), 536–564. https://doi.org/10.18800/lexis.202302.002

Chambers, J. K. (2015). Dialectology. In International Encyclopedia of the Social & Behavioral Sciences: Second Edition. https://doi.org/10.1016/B978-0-08-097086-8.52005-4

Chebanne, A. (2016). Writing Khoisan: Harmonized orthographies for development of under-researched and marginalized languages: The case of Cua, Kua, and Tsua dialect continuum of Botswana. Language Policy, 15(3), 277–297. https://doi.org/10.1007/s10993-015-9371-1

Chong, S., & Gedat, R. A. (2012). An introduction to the Austronesian languages in western Borneo. Language and Linguistics, 13(2), 321-349.

Collins, J. T. (2018). The Sekujam language of West Kalimantan (Indonesia). Wacana, 19(2), 425–458. https://doi.org/10.17510/wacana.v19i2.702

Collins, J. T. (2021). Keberagaman Bahasa dan Etnisitas di Kalimantan Barat [Language Diversity and Ethnicity in West Kalimantan]. Pontianak: Indonesia Melestarikan Bahasa Ibu.

Coluzzi, P., Riget, P. N., & Wang, X. (2013). Language vitality among the Bidayuh of Sarawak (East Malaysia). Oceanic Linguistics, 52(2), 375–395. https://doi.org/10.1353/ol.2013.0019

Contandriopoulos, D., Sapeha, H., & Larouche, C. (2019). Some insights related to social network analysis data collection challenges–a research note. International Journal of Social Research Methodology, 22(5), 463–468. https://doi.org/10.1080/13645579.2019.1574957

Dezsö, J. (2016). A magyar történeti dialektológia korszakai [Periods of Hungarian historical dialectology]. Magyar Nyelv, 112(1), 17–31. https://doi.org/10.18349/MagyarNyelv.2016.1.17

Dunn, J. (2019). Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology. Frontiers in Artificial Intelligence, 2, 1-22. https://doi.org/10.3389/frai.2019.00015

Effendy, C., Sulissusiawan, A., Syahrani, A., Jupitasari, M., Asfar, D. A., & Lubna, S. (2023). Marine fauna lexicon of Malay community in West Kalimantan. AIP Conf. Proc. 2913, 060017. https://doi.org/10.1063/5.0175681

François, A. (2020). In search of island treasures: Language documentation in the Pacific. Language Documentation and Conservation, 15(Special Issue), 276–294.

Francois, S., Wu, K., Doe, E., Tucker, A., & Theall, K. (2023). The influence of racial violence in neighborhoods and schools on the psycho-behavioral outcomes in adolescence. Research in Human Development, 20(1–2), 48-64. https://doi.org/10.1080/15427609.2023.2171694

Huisman, J. L. A., Franco, K., & van Hout, R. (2021). Linking linguistic and geographic distance in four semantic domains: Computational geo-analyses of internal and external factors in a dialect continuum. Frontiers in Artificial Intelligence, 4, 1-19. https://doi.org/10.3389/frai.2021.668035

Irawan, Y., Setiawan, F. A., Asfar, D. A., Irmayani, Herpanus, & Pramulya, M. (2024). Lexical and post-lexical prosodic documentation of Embaloh language. ILS, 13(1), 22–40. https://doi.org/10.33736/ils.6025.2024

Isaías, P., Pífano, S., & Miranda, P. (2012). Subject recommended samples: Snowball sampling. In Information Systems Research and Exploring Social Artifacts: Approaches and Methodologies (pp.43-57). https://doi.org/10.4018/978-1-4666-2491-7.ch003

Kehrein, R. (2012). Linguistic Atlases: Empirical Evidence for Dialect Change in the History of Languages. In The Handbook of Historical Sociolinguistics. https://doi.org/10.1002/9781118257227.ch26

Kessler, B. (1995). Computational dialectology in Irish Gaelic. In Proceedings of the Seventh Conference on European Chapter of the Association for Computational Linguistics (pp. 60-66). Morgan Kaufmann Publishers Inc. https://doi.org/10.3115/976973.976983

Kristophson, J. (2013). Theory of dialect (descriptive). In Die slavischen Sprachen / The Slavic Languages. Halbband 2 (pp. 2061–2067). Retrieved February 11, 2025, from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85119566486&partnerID=40&md5=8d9cc21b95de7d12077103b5bd33de09

Lafkioui, M. B. (2018). The Rif Berber language continuum: An algorithmic geolinguistic study. In HuldeAlbum Voor Jacques Van Keymeulen, hal-01914354. Retrieved February 11, 2025, from https://hal.science/hal-01914354/document

Leinonen, T., Çöltekin, Ç., & Nerbonne, J. (2016). Using Gabmap. Lingua, 178, 71–83. https://doi.org/10.1016/j.lingua.2015.02.004

Lendik, L. S., & Yuit, C. M. (2021). A preliminary study on the use of epithets in Kenyah Long Wat. Journal on Asian Linguistic Anthropology, 3(1), 56–75. https://doi.org/10.47298/jala.v3-i1-a3

Lindström, L., & Pilvik, M.-L. (2018). Korpuspõhine kvantitatiivne dialektoloogia [Corpus-based quantitative dialectology]. Keel ja Kirjandus, 61(8–9), 643–662.

Markus, M. (2022). A critical assessment of English dialect feature catalogues: Towards a dialectometrical evaluation of the English Dialect Dictionary Online. Lingua, 279. https://doi.org/10.1016/j.lingua.2022.103428

Mikuleniene, D. (2013). Contemporary linguistic situation in Lithuania: Geolinguistic aspects and new descriptive possibilities. Acta Baltico-Slavica, 37, 459–471. https://doi.org/10.11649/abs.2013.031

Mwelwa, J., & Spencer, B. (2013). A bilingual (Bemba/English) teaching resource: Realising agency from below through teaching materials designed to challenge the hegemony of English. Language Matters, 44(3), 51–68. https://doi.org/10.1080/10228195.2013.840011

Nath, P. K. (2008). Doing fieldwork on the Singpho language of North Eastern India. Cambridge University Press. https://doi.org/10.1017/UPO9788175968431.016

Nerbonne, J., Colen, R., Gooskens, C., Kleiweg, P., & Leinonen, T. (2011). Gabmap—A web application for dialectology. Dialectologia, II(SPEC. ISSUE 2), 65–89. https://raco.cat/index.php/Dialectologia/article/view/245345

Nerbonne, J., Kleiweg, P., Heeringa, W., & Manni, F. (2008). Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering. In C. Preisach, H. Burkhardt, L. Schmidt-Thieme, & R. Decker (Eds.), Data Analysis, Machine Learning and Applications (pp. 647–654). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_76

Nerbonne, J., & Kretzschmar Jr., W. (2006). Progress in dialectometry: Toward explanation. Literary and Linguistic Computing, 21(4), 387–397. https://doi.org/10.1093/llc/fql034

Nerbonne, J., & Kretzschmar, W. A. (2013). Dialectometry. Literary and Linguistic Computing, 28(1), 2–12. https://doi.org/10.1093/llc/fqs062

Nevaci, M. (2016). O cercetare sociolingvistica asupra dialectului aromân [A Sociolinguistic Research on the Aromanian dialect]. Fonetica si Dialectologie, 35, 145–154.

Nguyen, D., & Eisenstein, J. (2017). A Kernel independence test for geographical language variation. Computational Linguistics, 43(3), 567–592. https://doi.org/10.1162/COLI_a_00293

Pröll, S. (2013). Detecting structures in linguistic maps-fuzzy clustering for pattern recognition in geostatistical dialectometry. Literary and Linguistic Computing, 28(1), 108–118. https://doi.org/10.1093/llc/fqs059

Smith, A. D. (2021). The historical phonology of Hliboi, a bidayuh language of Borneo. Oceanic Linguistics, 60(1), 133–159. https://doi.org/10.1353/ol.2021.0004

Spencer, P. T. (2024). Documenting Endangered Languages with LangDoc: A Wordlist-Based System and A Case Study on Moklen. FieldMatters 2024—3rd Workshop on NLP Applications to Field Linguistics—Proceedings of the Workshop (pp. 28–36). Retrieved February 11, 2025, from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85204305321&partnerID=40&md5=8b870e068336e500a469c5dc988d2787

Spruit, M. R. (2006). Measuring syntactic variation in Dutch dialects. Literary and Linguistic Computing, 21(4), 493–505. https://doi.org/10.1093/llc/fql043

Sung, H. W. M., Prokić, J., & Chen, Y. (2024). A New Dataset for Tonal and Segmental Dialectometry from the Yue- and Pinghua-Speaking Area. In SIGTYP 2024—6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Proceedings of the Workshop (pp. 25–36). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189635282&partnerID=40&md5=b0c08abb36e6fba15b92311f5e3b2f82

Wei, W., & Schnell, J. (2025). The Routledge Handbook of Endangered and Minority Languages. Routledge. https://doi.org/10.4324/9781003439493

Wieling, M., & Nerbonne, J. (2015). Advances in Dialectometry. Annual Review of Linguistics, 1(1), 243–264. https://doi.org/10.1146/annurev-linguist-030514-124930

Wieling, M., Nerbonne, J., & Baayen, R. H. (2011). Quantitative social dialectology: Explaining linguistic variation geographically and socially. PLoS ONE, 6(9), 1-14. https://doi.org/10.1371/journal.pone.0023613

Wieling, M., Sassolini, E., Cucurullo, S., & Montemagni, S. (2016). ALT explored: Integrating an online dialectometric tool and an online dialect atlas. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp. 3265–3272). Retrieved February 11, 2025, from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85037117736&partnerID=40&md5=02b3dea69e1081b3cdaeadb1cc95667f

Yumnam, G., & Singh, C. I. (2024). A Bibliometric Perspective of Regional Languages on Select Scholarly Articles. DESIDOC Journal of Library and Information Technology, 44(1), 37–44. https://doi.org/10.14429/djlit.44.1.18938

Downloads

Published

2026-03-02

Issue

Section

Articles