Jagged competencies: Measuring the reliability of generative AI in academic research

Thomas, Llewellyn; Romasanta, Angelo Kenneth; Pujol Priego, Laia

dc.contributor	Universitat Ramon Llull. Esade
dc.contributor.author	Thomas, Llewellyn
dc.contributor.author	Romasanta, Angelo Kenneth
dc.contributor.author	Pujol Priego, Laia
dc.date.accessioned	2026-03-17T09:09:13Z
dc.date.available	2026-03-17T09:09:13Z
dc.date.issued	2026-01
dc.identifier.issn	0148-2963	ca
dc.identifier.uri	http://hdl.handle.net/20.500.14342/6069
dc.description.abstract	Large Language Models (LLMs) are increasingly viewed as a valuable tool for academic research. While LLMs have some benefits, a ‘crisis of replicability’ in management scholarship mitigates against unrestrained use. In this paper we investigate the reproducibility of LLM analyses. We analyze three LLMs—ChatGPT, Claude and Mistral—over fifteen weeks, testing the consistency, accuracy and their interaction using the same prompts on the same data corpus. While our results demonstrate significant variations in reliability and consistency across the three LLMs, we also show that LLMs can exhibit deterministic and reliable behavior under specific, well-defined constraints. We argue that replicable LLM-based research will rely on understanding and validating the task-specific operational boundaries of the LLM. To ensure the responsible integration of LLMs into management research, we highlight a need for robust frameworks, transparency, ethical guidelines, and ongoing evaluation. We conclude with actionable guidance for management researchers.	ca
dc.format.extent	14 p.	ca
dc.language.iso	eng	ca
dc.publisher	Elsevier Inc.	ca
dc.relation.ispartof	Journal of Business Research, Vol. 203, 115804	ca
dc.rights	© L'autor/a	ca
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject.other	Generative AI	ca
dc.subject.other	LLM	ca
dc.subject.other	Replication	ca
dc.subject.other	Reproducibility	ca
dc.subject.other	Consistency	ca
dc.subject.other	Accuracy	ca
dc.title	Jagged competencies: Measuring the reliability of generative AI in academic research	ca
dc.type	info:eu-repo/semantics/article	ca
dc.rights.accessLevel	info:eu-repo/semantics/openAccess
dc.embargo.terms	cap	ca
dc.identifier.doi	https://doi.org/10.1016/j.jbusres.2025.115804	ca
dc.description.version	info:eu-repo/semantics/publishedVersion	ca