Jagged competencies: Measuring the reliability of generative AI in academic research
Altres autors/es
Data de publicació
2026-01ISSN
0148-2963
Resum
Large Language Models (LLMs) are increasingly viewed as a valuable tool for academic research. While LLMs have some benefits, a ‘crisis of replicability’ in management scholarship mitigates against unrestrained use. In this paper we investigate the reproducibility of LLM analyses. We analyze three LLMs—ChatGPT, Claude and Mistral—over fifteen weeks, testing the consistency, accuracy and their interaction using the same prompts on the same data corpus. While our results demonstrate significant variations in reliability and consistency across the three LLMs, we also show that LLMs can exhibit deterministic and reliable behavior under specific, well-defined constraints. We argue that replicable LLM-based research will rely on understanding and validating the task-specific operational boundaries of the LLM. To ensure the responsible integration of LLMs into management research, we highlight a need for robust frameworks, transparency, ethical guidelines, and ongoing evaluation. We conclude with actionable guidance for management researchers.
Tipus de document
Article
Versió del document
Versió publicada
Llengua
Anglès
Paraules clau
Pàgines
14 p.
Publicat per
Elsevier Inc.
Publicat a
Journal of Business Research, Vol. 203, 115804
Citació recomanada
Aquesta citació s'ha generat automàticament.
Aquest element apareix en la col·lecció o col·leccions següent(s)
Drets
© L'autor/a
Excepte que s'indiqui una altra cosa, la llicència de l'ítem es descriu com http://creativecommons.org/licenses/by-nc-nd/4.0/


