Challenges and Opportunities for Prompt Management: Empirical Investigation of Text-based GenAI Users Nitish Patkar University of Applied Sciences and Arts Northwestern Switzerland Windisch, Switzerland nitish.patkar@fhnw.ch Anton Fedosov University of Applied Sciences and Arts Northwestern Switzerland Windisch, Switzerland anton.fedosov@fhnw.ch Martin Kropp University of Applied Sciences and Arts Northwestern Switzerland Windisch, Switzerland martin.kropp@fhnw.ch Abstract Generative AI (genAI) tools, like ChatGPT, have become popular not only with everyday users but also with Human-Computer In- teraction (HCI) researchers and practitioners. Despite their rapid adoption, there is a lack of studies examining their design, particu- larly regarding prompt handling, organization, and management. Our empirical survey study, involving 61 genAI tool users, addresses this gap by investigating the usability and user experience of the current features of these tools. We illustrate that advanced search and labeling functionalities and innovative interface designs can significantly enhance user experience as well as aid in reflecting on sustainability when using this technology. As genAI approaches the so-called “Trough of Disillusionment” (in Gartner’s Hype Cycle terms),1 our research aims to guide the design of genAI tools toward a more pragmatic and practical fit to end-user practices, ensuring that technology adoption comes with a deeper understanding of its capabilities and offerings. CCS Concepts • Human-centered computing→ Empirical studies in inter- action design. Keywords Generative AI, AI chatbot, LLM, Empirical Study, User survey, Prompt management 1 Introduction Generative AI tools have rapidly gained popularity, with ChatGPT alone attracting 100 million users and 590 million visits just two months after its launch in January 2023 [16]. This surge under- scores the transformative potential of the genAI market, which is expected to grow at an annual rate of 24.4% from 2023 to 2030, significantly impacting user interaction paradigms [10, 16]. Against this backdrop, HCI researchers have outlined how designers can effectively employ these tools to support Human-Centered Design 1https://www.gartner.com/en/research/methodologies/gartner-hype-cycle Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Mensch und Computer 2024 – Workshopband, Gesellschaft für Informatik e.V., 01.-04. September 2024, Karlsruhe, Germany © 2024 Copyright held by the owner/author(s). Publication rights licensed to GI. https://doi.org/10.18420/muc2024-mci-ws09-155 processes [7], reflected on how “Large Whatever Models” can trans- form HCI research cycles [20], and examined their potential and implications for HCI education and pedagogy [11]. Despite widespread adoption and use, there is a notable gap in HCI research focusing on the design and usability of these tools, particularly concerning prompt management, handling, and orga- nization. While recent HCI and computing studies have started to widely explore prompt engineering (e.g., [5, 14, 26]), the practical aspects of how users handle, organize, and manage prompts remain largely under-investigated. In turn, efficient prompt management is crucial for reducing inefficiencies, which can lead to frustrating user experiences, and mitigating environmental costs associated with redundant interactions. Our empirical survey study addresses this gap by investigat- ing the usability and user experience of current genAI tools. We collected data from 61 text-based genAI tool users (e.g., ChatGPT, Claude, etc.) to understand their routines, challenges, and needs regarding prompt management, i.e., concerning organizing past prompts, their overall experience with chatbot interfaces, and the effectiveness of existing features for managing interaction histo- ries. This research aims to guide the design of more intuitive and efficient genAI tools, with the goal of enhancing user experience throughout their use. Additionally, it calls for reflection on support- ing sustainable consumption practices due to the growing energy demands of such technology [4]. Consequently, we aim to answer the following research question: How do users perceive the current prompt management features in genAI tools, and what specific improvements do they seek to enhance their interaction experience? The gathered insights help us identify opportunities tomake prompt management more intuitive and efficient. Our insights help enhance how genAI tools are designed, operated, and adopted. 2 Related work Our work draws on the related research in chatbot use and design published during the last five years. Maroengsit et al. provide a comprehensive review of chatbot evaluation methods, focusing on effectiveness, efficiency, and user satisfaction [15]. Hussain et al. discuss the evolution of chatbots from simple scripts to advanced systems using AI and machine learning to enhance NLP capabili- ties [9]. Singh and Thakur analyzed the development of chatbots, emphasizing the transition tomore sophisticated systems [23]. They highlighted the integration of Semantic Nets and Machine Learning to improve chatbots’ ability to remember facts from conversations https://orcid.org/0000-0001-8084-4980 https://orcid.org/0000-0003-1604-2419 https://orcid.org/0000-0002-7439-6517 https://www.gartner.com/en/research/methodologies/gartner-hype-cycle https://doi.org/10.18420/muc2024-mci-ws09-155 MuC’24, 01.-04. September 2024, Karlsruhe, Germany Patkar et al. and accurately identify user intents, which enhances the overall in- teraction quality. Almansor and Hussain’s survey examines the role of AI in chatbots, identifying key challenges and future research areas in HCI: maintaining contextual understanding, generating contextually appropriate responses, and incorporating sentiment analysis [2]. Furthermore, our work is motivated by nascent research in im- proving the usability of AI chatbots. Akma et al. examined design techniques relevant to sectors such as education, healthcare, and customer service [1], while Vishwakarma et al. focused on method- ologies for constructing chatbots [25]. Borsci et al. identified 27 key attributes for user satisfaction in CRM chatbots, aiding in the evaluation and design of chatbot interactions [3]. Among the most relevant to our study are response time, the chatbot’s ability to handle multi-thread conversations, and the perceived ease of use. Ren et al. conducted a systematic mapping study on the usability of chatbots, particularly personal assistants in healthcare, pinpoint- ing crucial measures such as satisfaction, efficiency, and effective- ness [18]. Despite these recent research efforts, there is a need for more targeted research on the user interface and user experience aspects of genAI tools, which is practically absent at the moment. 3 Study Design We designed a cross-sectional survey following the guidelines by Kitchenham and Shull [12, 22]. The first author created the survey instrument, while the second author reviewed it. The feedback led to minor corrections in the wording of some questions, as no major problems were identified. 3.1 Survey Instrument The surveywas divided into four sectionswith amix of single-select, multiple-choice, and open-text questions:2 (1) User Familiarity and Use Cases: This section included seven questions to gauge respondents’ familiarity with generative AI tools and identify common use cases. (2) Prompt Management Challenges: Four questions aimed to understand the difficulties users face when revisiting past conversations, locating specific information, and managing prompts. (3) Satisfaction with Current Features: Seven questions assessed user satisfaction with existing features of generative AI tools, especially those related to prompt management. (4) Desired Features for Improvement: This section included seven questions soliciting user opinions on potential features that could enhance prompt management. We collected data for a month, from April 14 until May 15, 2024, and used Google Forms to administer the questionnaire. To ensure com- pleteness, all closed-ended questions were marked as mandatory, minimizing the risk of partial responses. 3.2 Population and sample The target population comprised individuals with experience using text-based genAI tools (e.g., ChatGPT, Gemini, Claude), without restriction to specific tools or domains of use. The main sampling 2The questionnaire is available at: https://doi.org/10.6084/m9.figshare.26012959.v1 method was self-recruiting [17]. We published invitations on the social media platform LinkedIn. The secondary sampling method involved sending direct invitations to individuals from our profes- sional networks. 3.3 Data Analysis and Validation Responses were automatically collected in a Google Sheet, which is available in the online supplementary material. We used frequency analysis to analyze the single- and multiple-choice responses and employed an open coding strategy to analyze responses to open- ended questions, aiming to extract concrete feature requests and identify overarching themes pertinent to the usability of prompt management [22]. Through recurrent meetings, we reviewed the emerging clusters of code groups and conceptualized the themes. We refined them through iterative discussions, resolving ambigui- ties in code grouping and reaching a consensus on the final naming and composition of the high-level themes. 4 Survey Results We received a total of 61 responses. In terms of the primary work domain, themajority of respondents (37) were employed in the Tech- nology/IT sector, followed by 13 in Technology/IT with a specific focus on Education. Other domains included Pharma and Health- care, see Figure 1. A significant number of respondents, especially those in technology and education, interact with genAI tools fre- quently: 27 individuals use them several times a week and 21 use them daily. Lastly, the most common purposes for using genAI tools were communication and writing (82%) as well as assistance in developing code (80.3%). Other uses of genAI include educational aid, such as support in learning new topics or subjects, including tutoring or explanatory content (63.9%), and data analysis (23%). We asked participants about the challenges of reviewing past conversations, locating specific information, finding prompts, and sorting or filtering conversation histories. The responses indicate that most users (59%) rarely or never review their past conversations with genAI tools (see Figure 2). Additionally, 39% of the respondents said they never needed to find a past prompt, while 49% reported that it is difficult to find a past prompt. Lastly, 34% of the respondents never felt a need to sort or filter their past conversations, while 55% do not feel particularly restricted by the lack of sorting/filtering options. We also asked participants their opinions about a few concrete features related to prompt management. 49% of the respondents said that history sorted by date is not particularly useful, while 42% indicated that a full-text search on past history would be helpful. Lastly, 42% said tagging or labeling of prompts could be very useful, with a total of 64% generally finding it useful. Finally, we asked participants to list desired features to aid their workflows with the genAI tools. The following features have been specifically requested by our participants: (1) Prompt Suggestion System: Dynamic suggestions for complet- ing prompts based on initial input. (2) ConversationManagement Options: Ability to choosewhether to save or delete conversations after completion. (3) Collaborative Use: Shared access to AI functionalities among users in the same project. https://doi.org/10.6084/m9.figshare.26012959.v1 Challenges and Opportunities for Prompt Management: Empirical Investigation of Text-based GenAI Users MuC’24, 01.-04. September 2024, Karlsruhe, Germany (a) Primary domain (b) Primary role Figure 1: Respondent background (4) AI-assisted Search for Conversations: Chatbot assistance in finding specific past conversations. (5) Chatbot-Specific Prompt Lists: Lists of effective prompts tai- lored to each chatbot’s capabilities and updates. (6) Template System with Placeholders: Customizable templates for generating content. (7) Tagging for Interlinking: Use of tagging to organize and in- terlink concepts. 5 Discussion Our findings highlight improvements in the design of generative AI tools. Additionally, they open discussions around the impact these aforementioned features have on user behavior and sustainability. 5.1 Redesigning for Efficient Prompt Management Our survey revealed that users rarely review past conversations due to ineffective search functionalities in genAI tools, which prioritize new prompts over searching history. This encourages users to re- prompt rather than search past interactions. (a) How often respondents revisit history (b) Ease of finding information Figure 2: Finding information Respondents did not use typical tasks like searching, sorting, and filtering because these features were unavailable. This high- lights the need for genAI tool designs that integrate these tasks to enhance user workflows. GenAI tools and instant messaging (IM) apps like WhatsApp rely on conversation as the main interaction type, though they are organized differently. For instance, unlike IM tools, genAI tools involve users “searching” for an answer and following up with prompts until they are satisfied with the response. Additionally, IM apps support multimodal interactions (voice, im- ages, video), while genAI tools, like ChatGPT, primarily focus on textual questions and answers. Despite these differences, features like searching and tagging are still missing in genAI tools. We speculate that our respondents rarely search their history, potentially due to the suboptimal and immature design of contem- porary genAI tools, even though this conceptual model is com- mon in text-centric interfaces. Improving search functionalities and organizing conversation history can enhance the effectiveness and efficiency of user interactions with genAI tools. This involves technical and UX challenges, such as deciding if searches should cover entire conversations or specific prompts and how tagging can improve efficiency. Finding specific past prompts would enable new interaction possibilities, such as initiating a new conversa- tion thread from an existing one, while also presenting challenges, such as preserving and maintaining context. We envision these improvements addressing current limitations and creating more user-friendly and sustainable genAI tools. MuC’24, 01.-04. September 2024, Karlsruhe, Germany Patkar et al. 5.2 The Environmental Toll of Prompting Our participants expressed a clear preference for re-prompting rather than searching through history: “If I need a response again, I just ask the same question. It’s faster for me than searching for an answer. Also, I’m basically never looking for an answer to a question I asked in the past,” and “I’m not sure why to store or search history when you can always ask again and get the answer. You can also get a summary at the end by asking for it.” This routine of re- prompting highlights a major design issue and leads to significant environmental costs. Recent research highlights the significant energy demands of large language model (LLM) chatbots and AI in general [4], which require GPUs or TPUs for each new prompt. While the energy use and emissions from a single inference are minor, the cumulative environmental impact of global chatbot services is substantial. For instance, themonthly energy and carbon footprints of these services are comparable to those from their final training sessions [10]. Similar to trust-supporting design elements in online platforms (e.g., [8]), developing interface features that encourage the reuse of past interactions can reduce redundant prompts and lower the carbon footprint associated with genAI use [10, 13, 24]. We hy- pothesize that emphasizing the effects of each prompt/transaction using simple graphical representations in the UI, such as poten- tial CO2 emissions generated or signifiers alike, would increase end-user understanding of the effects of genAI usage on the envi- ronment. We draw inspiration from a study by Abbing [19], where the author discusses a solar-powered website built as a low-tech solution, demonstrating how design can play a pivotal role in reduc- ing the environmental footprint of digital technologies. The author employed principles from degrowth [21] and sustainable HCI [6] practices to build a static site structure with minimal client-side computation, significantly minimizing energy use. We envision that such an innovative approach could be uniquely tailored to guide the design of sustainable genAI tools. 6 Conclusion Our study explores the challenges and opportunities in prompt man- agement for genAI tools, focusing on improving user experiences with them. The findings indicate that the lack of effective searching and intuitive organization of prompts and conversations leads users to repeatedly prompt for similar queries, which is environmentally inefficient and contributes to a suboptimal user experience. To aid usability and sustainability, future designs should include novel interaction capabilities, such as adequate search functionali- ties to facilitate better organization and retrieval of past interactions and ‘environment-aware’ signifiers to reduce re-prompting. Imple- menting features like dynamic prompt suggestions, prompt tem- plates, conversation management options, and AI-assisted search can significantly enhance end-user workflows and aid in the sus- tainability of such systems. In our further work, we aim to explore the ecological impact of generative AI tools and propose strategies of reducing redundant prompts through efficient prompt management features. We look forward to feedback from HCI researchers during the workshop on their prompt management practices with genAI, beyond text-based tools, which was the primary subject of our inquiry. Acknowledgments We thank Dr. Erion Elmasllari for his support in reviewing the draft of this position paper. References [1] Nahdatul Akma Ahmad, Mohamad Hafiz Che, Azaliza Zainal, Muhammad Fairuz Abd Rauf, and Zuraidy Adnan. 2018. Review of chatbots design techniques. International Journal of Computer Applications 181, 8 (2018), 7–10. [2] Ebtesam H Almansor and Farookh Khadeer Hussain. 2020. Survey on intelligent chatbots: State-of-the-art and future research directions. In Complex, Intelligent, and Software Intensive Systems: Proceedings of the 13th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2019). Springer, 534–543. [3] Simone Borsci, Alessio Malizia, Martin Schmettow, Frank Van Der Velde, Gunay Tariverdiyeva, Divyaa Balaji, and Alan Chamberlain. 2022. The chatbot usability scale: the design and pilot of a usability scale for interaction with AI-based conversational agents. Personal and ubiquitous computing 26 (2022), 95–119. [4] Alex de Vries. 2023. The growing energy footprint of artificial intelligence. Joule 7, 10 (2023), 2191–2194. [5] Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with copilot: Exploring prompt engineering for solving cs1 problems using natural language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 1136–1142. [6] Carl DiSalvo, Phoebe Sengers, and Hrönn Brynjarsdóttir. 2010. Mapping the landscape of sustainable HCI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 1975–1984. https://doi.org/10.1145/ 1753326.1753625 [7] Passant Elagroudy, Jie Li, Kaisa Väänänen, Paul Lukowicz, Hiroshi Ishii, Wendy E Mackay, Elizabeth F Churchill, Anicia Peters, Antti Oulasvirta, Rui Prada, et al. 2024. Transforming HCI Research Cycles using Generative AI and “Large What- ever Models”(LWMs). In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–5. [8] Anton Fedosov, Liudmila Zavolokina, Sina Krumhard, and Elaine M Huang. 2023. “This Could Be The Day I Die”: Unpacking Interpersonal and Systems Trust in a Local Sharing Economy Community. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–7. [9] Shafquat Hussain, Omid Ameri Sianaki, and Nedal Ababneh. 2019. A survey on conversational agents/chatbots classification and design techniques. InWeb, Artificial Intelligence and Network Applications: Proceedings of the Workshops of the 33rd International Conference on Advanced Information Networking and Applications (WAINA-2019) 33. Springer, 946–956. [10] Peng Jiang, Christian Sonne, Wangliang Li, Fengqi You, and Siming You. 2024. Preventing the Immense Increase in the Life-Cycle Energy and Carbon Footprints of LLM-Powered Intelligent Chatbots. Engineering (2024). [11] Ahmed Kharrufa and Ian G Johnson. 2024. The Potential and Implications of Generative AI on HCI Education. arXiv preprint arXiv:2405.05154 (2024). [12] Barbara A Kitchenham, Shari Lawrence Pfleeger, Lesley M Pickard, PeterW Jones, David C. Hoaglin, Khaled El Emam, and Jarrett Rosenberg. 2002. Preliminary guidelines for empirical research in software engineering. IEEE Transactions on software engineering 28, 8 (2002), 721–734. [13] Baolin Li, Yankai Jiang, Vijay Gadepally, and Devesh Tiwari. 2024. Toward Sus- tainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference. arXiv preprint arXiv:2403.12900 (2024). [14] Vivian Liu and Lydia B Chilton. 2022. Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–23. [15] Wari Maroengsit, Thanarath Piyakulpinyo, Korawat Phonyiam, Suporn Pongnumkul, Pimwadee Chaovalit, and Thanaruk Theeramunkong. 2019. A survey on evaluation methods for chatbots. In Proceedings of the 2019 7th Inter- national conference on information and education technology. 111–119. [16] Dan Milmo. 2023. ChatGPT reaches 100 million users two months after launch. The Guardian 2 (2023). [17] Teade Punter, Marcus Ciolkowski, Bernd Freimut, and Isabel John. 2003. Con- ducting on-line surveys in software engineering. In 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings. IEEE, 80–88. [18] Ranci Ren, Mireya Zapata, John W Castro, Oscar Dieste, and Silvia T Acuña. 2022. Experimentation for chatbot usability evaluation: A secondary study. IEEE Access 10 (2022), 12430–12464. [19] Roel Roscam Abbing. 2021. ‘This is a solar-powered website, which means it sometimes goes offline’: a design inquiry into degrowth and ICT. In LIMITS’21, June 14-15 2021, Virtual workshop. PubPub. [20] Albrecht Schmidt, Passant Elagroudy, Fiona Draxler, Frauke Kreuter, and Robin Welsch. 2024. Simulating the Human in HCD with ChatGPT: Redesigning Inter- action Design with AI. Interactions 31, 1 (jan 2024), 24–31. https://doi.org/10. https://doi.org/10.1145/1753326.1753625 https://doi.org/10.1145/1753326.1753625 https://doi.org/10.1145/3637436 https://doi.org/10.1145/3637436 Challenges and Opportunities for Prompt Management: Empirical Investigation of Text-based GenAI Users MuC’24, 01.-04. September 2024, Karlsruhe, Germany 1145/3637436 [21] Vishal Sharma, Neha Kumar, and Bonnie Nardi. 2023. Post-growth Hu- man–Computer Interaction. ACM Trans. Comput.-Hum. Interact. 31, 1, Article 9 (nov 2023), 37 pages. https://doi.org/10.1145/3624981 [22] Forrest Shull, Janice Singer, and Dag IK Sjøberg. 2007. Guide to advanced empirical software engineering. Springer. [23] Siddhant Singh and Hardeo K Thakur. 2020. Survey of various AI chatbots based on technology used. In 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO). IEEE, 1074–1079. [24] Jovan Stojkovic, Esha Choukse, Chaojie Zhang, Inigo Goiri, and Josep Torrellas. 2024. Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference. arXiv preprint arXiv:2403.20306 (2024). [25] Ashutosh Vishwakarma and Ankur Pandey. 2021. A review & comparative analysis on various chatbots design. International Journal of Computer Science and Mobile Computing 10, 2 (2021), 72–78. [26] JD Zamfirescu-Pereira, Richmond Y Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21. https://doi.org/10.1145/3637436 https://doi.org/10.1145/3624981 Abstract 1 Introduction 2 Related work 3 Study Design 3.1 Survey Instrument 3.2 Population and sample 3.3 Data Analysis and Validation 4 Survey Results 5 Discussion 5.1 Redesigning for Efficient Prompt Management 5.2 The Environmental Toll of Prompting 6 Conclusion Acknowledgments References