The UHM Department of Linguistics is engaging with generative artificial intelligence across theoretical, practical, and ethical dimensions. We are particularly committed to ensuring that Pacific languages and their speakers have a voice in the ongoing development of AI technologies. The AI Working Group welcomes all faculty and students interested in these critical issues.
AI and Linguistic Research
Large Language Models offer unprecedented resources for understanding language evolution, psycholinguistic behavior, and language acquisition. Working Group members are currently exploring how grammatical features emerge within these models, providing new insights into the mechanisms by which LLMs appear to acquire language. This research opens novel avenues for testing linguistic theories and understanding the fundamental processes of language learning, production, and comprehension.
AI for Language Documentation and Conservation
AI has enormous potential to accelerate language documentation and conservation efforts. The Working Group shares strategies and techniques for leveraging AI across the tasks that form the foundation of language documentation work, including:
- Automated interlinear glossing
- Lexical database management and organization
- Audio and video transcription
- Development of educational materials such as dictionaries, storybooks, and teaching aids for Indigenous languages
These applications can significantly reduce the time and resources required for documentation projects while improving accessibility for community members and researchers.
Ethical Challenges in AI and Linguistics
UHM is actively working to develop ethical guidelines for AI use in linguistics. Like any transformative technology, generative AI raises complex ethical questions. For the language sciences, two potentially competing challenges have emerged:
Data Sovereignty and Indigenous Languages
Language documentation efforts over recent decades have yielded vast corpora of naturalistic language samples housed in dedicated repositories. Much of this material was collected with explicit consent for use by communities and researchers. However, this consent did not anticipate the ingestion of language data—including recordings and annotations—into AI language models. Without clear guidelines and protocols, Indigenous communities risk losing control over their linguistic heritage.
Representation in AI Development
Equally concerning is the risk that Large Language Models develop without meaningful input from Indigenous languages and their communities of speakers/signers. As models approach generalized reasoning capabilities, there is a significant danger that such reasoning will be biased toward non-Indigenous worldviews—much as 20th-century linguistic theories were biased toward European languages.
Toward ethical guidelines
These ethical issues mirror challenges faced during the digital transition at the end of the 20th century. Just as new frameworks were developed to handle digital data and internet dissemination, the Working Group is committed to developing frameworks that protect Indigenous data sovereignty while ensuring diverse linguistic perspectives inform AI development. Our goal is to create guidelines that serve both the advancement of linguistic science and the rights and interests of Indigenous communities.

University of Hawaiʻi at Mānoa Department of Linguistics
1890 East West Road, Moore Hall 569 [map]
Honolulu, Hawaiʻi 96822 USA
Office Hours M-F 8 AM – 4:30 PM
+1(808) 956-8602 / linguist@hawaii.edu
