A Large-Scale Annotated Dataset of Contemporary Latter-day Saint Worship Services in the United States
Spencer Dean Stewart, Purdue University
Marshall Butler, Purdue University
Abstract. The transition to hybrid worship beginning with the COVID-19 pandemic has provided scholars with an opportunity to analyze religious language at a much larger scale than previously possible. This article introduces a novel dataset of weekly worship services from the Church of Jesus Christ of Latter-day Saints (LDS) that was created to study local religious dynamics within a highly centralized institutional context. We detail the collection of transcripts via the YouTube API (2022–2025) and the development of a multi-stage annotation pipeline that incorporates Large Language Models (LLMs) to identify individual speakers and available attributes such as age and gender. Validation against human coding confirms accuracy exceeding 96 percent across key variables. Furthermore, demographic analysis demonstrates that the sample covers counties containing 58 percent of the U.S. LDS population, exhibiting political and cultural representativeness with only moderate urban-infrastructural bias. By providing a structured, verifiable corpus of contemporary vernacular theology, this dataset offers scholars a new resource for understanding weekly religious practice in the United States from roughly 2022–2025.
Stewart, Spencer Dean and Marshall Butler. 2026. “A Large-Scale Annotated Dataset of Contemporary Latter-day Saint Worship Services in the United States,” Journal of the Mormon Social Science Association 4, no. 1: 1–16.
https://doi.org/10.54587/JMSSA.0401