Evaluation of the programming skills of Large Language Models

Type
05 - Research report or working paper
Editors
Editor (Corporation)
Supervisor
Parent work
Special issue
DOI of the original publication
Link
Series
Series number
Volume
Issue / Number
Pages / Duration
Patent number
Publisher / Publishing institution
arxiv
Place of publication / Event location
Ithaca
Edition
Version
Programming language
Assignee
Practice partner / Client
Abstract
The advent of Large Language Models (LLM) has revolutionized the efficiency and speed with which tasks are completed, marking a significant leap in productivity through technological innovation. As these chatbots tackle increasingly complex tasks, the challenge of assessing the quality of their outputs has become paramount. This paper critically examines the output quality of two leading LLMs, OpenAI's ChatGPT and Google's Gemini AI, by comparing the quality of programming code generated in both their free versions. Through the lens of a real-world example coupled with a systematic dataset, we investigate the code quality produced by these LLMs. Given their notable proficiency in code generation, this aspect of chatbot capability presents a particularly compelling area for analysis. Furthermore, the complexity of programming code often escalates to levels where its verification becomes a formidable task, underscoring the importance of our study. This research aims to shed light on the efficacy and reliability of LLMs in generating high-quality programming code, an endeavor that has significant implications for the field of software development and beyond.
Keywords
Project
Event
Exhibition start date
Exhibition end date
Conference start date
Conference end date
Date of the last check
ISBN
ISSN
Language
English
Created during FHNW affiliation
Yes
Strategic action fields FHNW
Publication status
Published
Review
No peer review
Open access category
License
'http://rightsstatements.org/vocab/InC/1.0/'
Citation
Heitz, L., Chamas, J., & Scherb, C. (2024). Evaluation of the programming skills of Large Language Models. arxiv. https://doi.org/10.48550/arXiv.2405.14388