Modelling Competence(s) in Written Comparison Tests Uwe Schürmann, Georg Bruckmaier University of Applied Sciences and Arts Northwestern Switzerland, School of Education Abstract The present study investigates the extent to which tasks in written comparison tests for Year 8 students in Germany (VERA-8) meet established quality criteria for modelling tasks. Furthermore, it examines whether these tasks are suitable for assessing modelling competence in a theory-based manner, drawing on atomistic and holistic approaches to conceptualising modelling competence. The findings indicate that VERA-8 tasks do not fully meet the established quality criteria for modelling tasks, such as authenticity and openness. Nevertheless, the assessment of modelling competence based on atomistic and holistic approaches is, in principle, a feasible undertaking. Keywords Mathematical modelling, Competence, Assessment, Comparison tests, Atomistic and holistic approaches, Educational standards 1 Introduction Mathematical modelling—the process of solving realistic problems with mathematical means, a process which can be characterised by its interplay between reality and mathematics (Niss et al., 2007)—is a central goal of mathematics education in many countries (Kaiser, 2020). However, analyses of test items from national course tests (Frejd, 2011, 2013) and central examinations (Greefrath et al., 2017; Siller & Greefrath, 2020) indicate that written tests frequently neglect key aspects of modelling. Therefore, even when modelling is firmly embedded in curricula, this does not necessarily guarantee that competences are adequately assessed in examinations. This discrepancy between curricular expectations and assessment practice raises the question of whether it also occurs in other contexts. Research has examined how modelling can be fostered (Cevikbas et al., 2022) and assessed in different formats (Kaiser, 2007) while written tests remain the primary focus (Frejd, 2013). Beyond such explicitly designed tasks, modelling is also assessed in large- scale assessments (e.g., OECD, 2023), central examinations, and nationwide comparison tests like VERA-8 (German: Vergleichs-Arbeiten in Jahrgangsstufe 8, i.e., comparison tests in Year 8). However, it is important to note that these written tests must meet additional educational policy and practical requirements (Drüke-Noe, 2012), which may complicate the systematic assessment of modelling competence. In this context, the present study examines how modelling competence is assessed in VERA-8, since these tests are designed to ensure that the educational standards defined by the German Standing Conference of the Ministers of Education and Cultural Affairs (KMK, 2004) are implemented in schools, including those related to mathematical modelling. The study is based on the premise that modelling competence is best assessed through tasks that meet established quality criteria, such as authenticity or openness of the task (Maaß, 2010; also see Table 4.1). 2 Theoretical Background Competence is understood as a domain-specific, context-dependent disposition to perform, which can be learnt in principle (Weinert, 2001). Competence diagnostics therefore require an alignment between the internal structure of a competence and the procedures used to assess it. In addressing this question, the present study will consider both the modelling of competence and its subsequent assessment. 2.1 Modelling Competence(s) Competence is essentially defined by knowledge and skills, though debate remains over whether affective aspects should also be included. Thus, modelling competence is sometimes defined with (e.g., Kaiser, 2007; Maaß, 2006) or without (e.g., Niss & Højgaard, 2019) affective and motivational components. A theoretical reconstruction of modelling competence must be distinguished from individuals’ actual performance dispositions, as empirical studies demonstrate its close connection to other competences, for instance problem solving (Greefrath, 2015) and reading (Krawitz et al., 2022). Definitions of modelling competence can generally be categorised into holistic (top- down) and atomistic (bottom-up) approaches (Cevikbas et al., 2022). A holistic approach to modelling competence involves the assessment of competence throughout the entire modelling cycle. This approach typically characterises modelling competence in terms of levels. For instance, Greer and Verschaffel’s (2007) approach involves the evaluation of the role and utilisation of mathematical modelling across various disciplines and within society. Henning and Keune’s (2007) work offers a similar perspective, characterising the highest level as the analysis and reflection on model construction, its purpose, and the criteria for evaluation. In contrast, atomistic approaches delineate modelling competence through sub- competences corresponding to phases of a diagnostic modelling cycle (e.g., Blum & Leiß, 2007). These typically include understanding real-world problems, developing a real model, constructing and solving a mathematical model, interpreting results, and validating solutions. In accordance with prevailing convention, we refer to modelling competence in the singular for the overall process and to modelling competences in the plural for its sub-components. In the context of test and task design, the distinction between the two approaches is of practical relevance. Atomistic approaches enable finer-grained diagnostic differentiation, whereas holistic approaches capture integrated performance. 2.2 Assessment of Modelling Competence(s) Here, assessment is understood as the empirical evaluation of individuals’ modelling competence(s), excluding purely theoretical arguments, descriptions of classroom practice or grading. Approaches range from subtasks in standardised tests to complex projects, generating quantitative (e.g., multiple-choice responses) or qualitative data (e.g., videos, transcripts, portfolios). These may be static or process-oriented; the latter category includes eye-tracking data (Schindler et al., 2025) and log files from computer-based assessments that capture click patterns, response sequences, and processing times (Hankeln et al., 2025). In accordance with the principles of competence diagnostics, assessments can be formative to support learning or summative to evaluate performance (see Table 2.1). Table 2.1: Assessment of modelling competence(s) Aspect Characteristic and examples Range Atomistic (e.g., sub-tasks) ↔ Holistic (e.g., modelling projects) Type of empirical data Quantitative (e.g., Multiple choice resp.) ↔ Qualitative (e.g., videos) Static (e.g., single responses) ↔ Dynamic (e.g., eye-tracking) Diagnostic function Formative ↔ Summative The focus of written tests can be exclusively on modelling (e.g., Hankeln et al., 2019) or assessing it alongside other mathematical competences. Furthermore, such tests vary in their mode of administration, ranging from paper-and-pencil to fully computer-based and hybrid formats. Additionally, there is variation in task types, such as the use of tactile tools (e.g., compasses) in paper-based tests or animations and interactive graphics in computer- based formats (e.g., Wirth & Greefrath, 2024). 3 Research questions In consideration of the aforementioned factors relating to modelling competence(s) and their assessment—and based on the premise that modelling competence(s) are best assessed when students engage in tasks that meet defined criteria—, the objective of this study is to determine whether and to what extent modelling skills are assessed by VERA-8 tasks. Furthermore, the present study investigates the additional potential of these tasks for a theory- based assessment of modelling competence(s). Therefore, the present study investigates the following two research questions (RQ): • RQ 1: To what extent do VERA-8 tasks designed to assess modelling competence meet established quality criteria for modelling tasks? • RQ 2: To what extent do the VERA-8 tasks allow for a theory-based assessment of modelling competence, taking into account both atomistic and holistic approaches to defining modelling competence(s)? RQ 1 is an empirical question to be clarified, while RQ 2 concerns theoretical considerations regarding the potential of comparison tests such as VERA-8. 4 Methods The task dataset under consideration comprises 474 VERA-8 tasks, which are freely available online to registered teachers (https://www.aufgabenbrowser.de). According to the website’s filtering function, 168 feature at least one subtask that addresses modelling competence(s). These 168 tasks, which comprise 310 subtasks (266 of which are modelling-focused), were analysed using qualitative content analysis (Kuckartz, 2019). An example is demonstrated in Figure 4.1. Figure 4.1: Example task “Walking in a Circle” In order to address RQ 1, an empirical analysis of test items was conducted. The coding framework (see Table 4.1) was based on Maaß’s (2010) classification framework for modelling tasks, which was adapted for the purposes of this study. Key adaptations include the omission of Cognitive demand, the addition of an ‘Other’ category to Situation, the redefinition of the category Openness as Open in problem, Open in solution path, and Open in outcome (Bruder, 2005), and the alignment of Mathematical area with the subject domains of the national standards (KMK, 2004). These alterations were implemented to align with the distinct parameters inherent to the VERA-8 test design. The adapted framework is a tailored, pragmatic approach specifically for VERA-8 tasks. Nevertheless, it can, in principle, be applied to modelling tasks in written tests in general. Table 4.1: Categories and codes No. Categories Codes 1 Modelling activities Understand, simplifying & structuring / Mathematising / Working mathematically / Interpreting & validating 2 Data Matching / Missing / Redundant / Imprecise / Inconsistent 3 Relationship to reality Authentic & close to reality / Embedded / Intentionally artificial & fantasy 4 Situation Personal / Occupational / Public / Scientific / Other 5 Type of model Descriptive / Normative 6 Representations Text / Table / Picture / Sketch / Diagram / Graph 7 Openness Open in problem / Open in solution path / Open in outcome 8 Mathematical content Quantity / Measurement / Space & Shape / Functional relationship / Data & Chance 9 Modelling level Criticise / Reflect / Select Model Adopting Maaß’s approach, categories such as Modelling activity (Blum & Leiß, 2007) and Type of model (Meyer, 1984) are deductive categories. Conversely, categories such as Data (Verschaffel et al., 2020) or Relationship to reality function as natural codes. Some categories allow single coding (e.g., Relationship to reality, Type of model, Mathematical content), while others permit multiple coding (e.g., Modelling activities, Data, Situation, Representations, Openness). Coding units were individual subtasks (first cycle coding), later aggregated at task or booklet level (second cycle coding). The coding of the subtasks was conducted independently by two raters on a binary scale. Disagreements were resolved by consensus in favour of the task (e.g., if raters disagreed on the authenticity of a task, it was rated as authentic). The present procedure was developed to minimise potential negative bias resulting from subjective judgement. For instance, the task entitled “Walking in a Circle” (see Figure 4.1) was coded as follows: (1) Students must understand the situation and interpret and validate the provided models (i.e., the graphs). (2) Solving the task does not require engagement with numerical data. Nevertheless, one redundant piece of data is redundant (i.e., “one meter”). (3) The context is not necessary, so the task is embedded in reality. (4) The situation described in the task does not correspond to any of the following categories: personal, occupational, public or scientific. Therefore, the situation was coded as ‘Other’. (5) The task involves descriptive modelling and (6) uses graphs for representation. (7) The task was coded as entirely closed. (8) In terms of the mathematical content, students must deal with a functional relationship. (9) They must reflect on the given models and select the correct one. In order to respond to RQ 2, the empirical findings on RQ 1 were supplemented by theoretical considerations on modelling competence(s) and their assessment, while also taking into account the practical requirements of nationwide comparison tests (Drüke-Noe, 2012). 5 Results 5.1 Fulfilment of Established Quality Criteria for Modelling Tasks (RQ 1) Below, we describe the results chronologically according to the nine categories in Table 4.1. 1. Modelling Activities: Only 14 of the 266 modelling-related subtasks require holistic model- ling. Among atomistic modelling items, most target mathematising (178 items) and working mathematically (169 items), while understanding, simplifying and structuring (40 items) and interpreting and validating (77 items) appear less often. 2. Data: In most cases, tasks provide the exact data needed, so dealing with missing (20 items), imprecise (23 items), or redundant (24 items) information is rarely necessary. Only 2 items simulate data research, such as extracting figures from a table. Items with inconsistent data are absent. 3. Relationship to reality: Many tasks (92 items) are essentially mathematical problems with an embedded real-world context. While authentic (80 items) or close-to-reality (85 items) contexts are common, deliberately artificial (5 items) or fictional (2 items) situations occur only sporadically. 4. Situation: Most tasks occur in personal (208 items) or occupational (73 items) contexts. Societal (16 items) and scientific (19 items) situations are addressed far less frequently, and some tasks (27 items) cannot be clearly assigned to any of the predefined categories. 5. Type of model: Normative modelling only appears in exceptional cases (5 items). 6. Representations: Around half of the subtasks (169 items) include at least one representation, with photographs (52 items) and drawings (55 items) being the most common. Mathematical representations such as diagrams (17 items), coordinate graphs (31 items), or sketches (14 items), as well as tables (39 items) and written materials (6 items), are used less frequently. 7. Openness of the task: Openly formulated tasks are the exception: Only a few subtasks allow for openness in the solution process (61 items) or the outcome (35 items); most subtasks (195 items) are fully closed. 8. Mathematical content: Of the 266 items, 57 are assigned to the mathematical content quantity, 37 to measurement, 2 to space and shape, 85 to functional relationships, and 86 to data and chance. 9. Modelling level: Only a limited number of subtasks (12 items) reach the highest modelling level, requiring students to critically reflect on the choice of model or solution strategy. Taken together, assuming that modelling competences are best assessed when students engage with tasks meeting defined quality criteria for modelling tasks, the results indicate that VERA-8 may, in many cases, only partially capture students’ modelling competence(s). Although all sub-competences are represented, some of them appear relatively infrequently. Missing, imprecise, and redundant data are also uncommon, and a multitude of the tasks lack authenticity. Normative modelling is almost absent, and many tasks are entirely closed. 5.2 Enabling a Theory-based Assessment of Modelling Competence (RQ 2) VERA-8 is well suited to the atomistic assessment of modelling competence, although some items also demonstrate that smaller modelling processes can be completed holistically. When participating in VERA-8, teachers receive quantitative results of their participating students at individual or group level (e.g., class or school) on the frequency of correct responses to specific tasks and competences. While qualitative analysis of students’ responses is possible, it demands significant effort from teachers, even with the provided support materials. In a computer-based format (already used in some of the 16 German federal states), such analyses could be supported by artificial intelligence. While data from paper-based assessments are static (i.e., coded as correct/incorrect), computer-based formats allow the capture of click patterns, order of task completion, and response times (Hankeln et al., 2025). Comparative assessments are primarily designed as summative instruments, though formative use becomes possible when testing is repeated over time. For example, in the federal state Hamburg (Germany) pupils participate in a comparative assessment on an annual basis. A student ID system enables the tracking of individual competence development (https://www.kermit-hamburg.de). The German educational standards (KMK, 2004) emphasise a component-oriented (atomistic) description of modelling competences. The test items align closely with the steps of the modelling cycle proposed by Blum and Leiß (2007). Concurrently, VERA-8 incorporates subtasks that require holistic modelling, demonstrating that such an approach is, in principle, feasible. To evaluate modelling competence across developmental levels, items reflecting the highest proficiency levels must be included. VERA-8 does, in fact, contain such items, albeit only a few. 6 Discussion and Outlook Regarding RQ 1, the findings suggest limitations in the extent to which VERA-8 tasks meet established quality criteria for modelling tasks. The findings are consistent with similar results reported for tasks used in central examinations in Germany and Austria (Greefrath et al., 2017; Siller & Greefrath, 2020). Regarding RQ 2, the findings show that VERA-8 offers several ways to assess modelling competence(s), which could be further enhanced through computer-based testing, particularly by analysing log files. However, comparative assessments must capture a broad range of competences and meet various design requirements (Drüke-Noe, 2012), meaning certain trade-offs are unavoidable. Nevertheless, the study suggests that the discrepancy between curricular expectations and assessment practice can be reduced in principle, since VERA-8 already contains elements enabling theory-based assessments using both atomistic and holistic approaches to modelling competence(s). Several criteria should be given greater consideration in future. For example, task texts could contain additional information, or missing data could be simulated by including tables or texts (e.g., newspaper articles). When developing modelling tasks, greater emphasis could also be placed on sub-processes such as Understanding, Simplifying, Structuring, and Interpreting and Validating. Furthermore, more modelling tasks could be developed that align with the mathematical content Measurement as well as Space and Shape. References Blum, W., & Leiß, D. (2007). How do Students and Teachers Deal with Modelling Problems? In C. Haines, P. Galbraith, W. Blum, & S. Khan (Eds), Mathematical Modelling: Education, Engineering and Economics (ICTMA 12) (pp. 222–231). Horwood. https://doi.org/10.1533/9780857099419.5.221 Bruder, R. (2005). Working with tasks for the learning of problem solving in maths teaching as an issue of the first teacher training phase. ZDM, 37(5), 351–353. https://doi.org/10.1007/s11858-005-0022-4 Cevikbas, M., Kaiser, G., & Schukajlow, S. (2022). A systematic literature review of the current discussion on mathematical modelling competencies: State-of-the-art developments in conceptualizing, measuring, and fostering. Educational Studies in Mathematics, 109(2), 205–236. https://doi.org/10.1007/s10649-021-10104-6 Drüke-Noe, C. (2012). Können Lernstandserhebungen einen Beitrag zur Unterrichtsentwicklung leisten? [Can Learning Assessments Contribute to Instructional Development?] In W. Blum, R. Borromeo Ferri, & K. Maaß (Eds), Mathematikunterricht im Kontext von Realität, Kultur und Lehrerprofessionalität: Festschrift für Gabriele Kaiser (pp. 284–293). Vieweg+Teubner. https://doi.org/10.1007/978-3- 8348-2389-2_29 Frejd, P. (2011). An investigation of mathematical modelling in the Swedish national course tests in mathematics. Proceedings of the Seventh Congress of the European Society for Research in Mathematics Education, 947–956. https://hal.science/hal-02158191/ Frejd, P. (2013). Modes of modelling assessment—A literature review. Educational Studies in Mathematics, 84(3), 413–438. https://doi.org/10.1007/s10649-013-9491-5 Greefrath, G. (2015). Problem Solving Methods for Mathematical Modelling. In G. A. Stillman, W. Blum, & M. Salett Biembengut (Eds), Mathematical Modelling in Education Research and Practice: Cultural, Social and Cognitive Influences (pp. 173–183). Springer International Publishing. https://doi.org/10.1007/978-3-319-18272-8_13 Greefrath, G., Siller, H.-S., & Ludwig, M. (2017). Modelling problems in German grammar school leaving examinations (Abitur) – Theory and practice. 932–939. https://hal.science/hal-01933483 Greer, B., & Verschaffel, L. (2007). Modelling Competencies—Overview. In W. Blum, P. L. Galbraith, H.- W. Henn, & M. Niss (Eds), Modelling and Applications in Mathematics Education: The 14th ICMI Study (pp. 219–224). Springer US. https://doi.org/10.1007/978-0-387-29822-1_22 Hankeln, C., Adamek, C., & Greefrath, G. (2019). Assessing Sub-competencies of Mathematical Modelling— Development of a New Test Instrument. In G. A. Stillman & J. P. Brown (Eds), Lines of Inquiry in Mathematical Modelling Research in Education (pp. 143–160). Springer International Publishing. https://doi.org/10.1007/978-3-030-14931-4_8 Hankeln, C., Kroehne, U., Voss, L., Gross, S., & Prediger, S. (2025). Developing digital formative assessment for deep conceptual learning goals: Which topic-specific research gaps need to be closed? Educational Technology Research and Development. https://doi.org/10.1007/s11423-025-10486-x Henning, H., & Keune, M. (2007). Levels of Modelling Competencies. In W. Blum, P. L. Galbraith, H.-W. Henn, & M. Niss (Eds), Modelling and Applications in Mathematics Education: The 14th ICMI Study (pp. 225–232). Springer US. https://doi.org/10.1007/978-0-387-29822-1_23 Kaiser, G. (2007). Modelling and Modelling Competencies in School. In C. Haines, P. Galbraith, W. Blum, & S. Khan (Eds), Mathematical Modelling ICTMA 12: Education, Engineering and Economics (pp. 110–119). Horwood. https://doi.org/10.1533/9780857099419.3.110 Kaiser, G. (2020). Mathematical Modelling and Applications in Education. In S. Lerman (Ed.), Encyclopedia of Mathematics Education (pp. 553–561). Springer. https://doi.org/10.1007/978-3-030-15789-0_101 KMK. (2004). Bildungsstandards im Fach Mathematik für den Mittleren Schulabschluss: Beschluss vom 4.12.2003 [Educational Standards in Mathematics for the Intermediate Secondary School Level: Resolution of December 4, 2003]. Luchterhand. Krawitz, J., Chang, Y.-P., Yang, K.-L., & Schukajlow, S. (2022). The role of reading comprehension in mathematical modelling: Improving the construction of a real-world model and interest in Germany and Taiwan. Educational Studies in Mathematics, 109(2), 337–359. https://doi.org/10.1007/s10649- 021-10058-9 Kuckartz, U. (2019). Qualitative Text Analysis: A Systematic Approach. In G. Kaiser & N. Presmeg (Eds), Compendium for Early Career Researchers in Mathematics Education (pp. 181–197). Springer International Publishing. https://doi.org/10.1007/978-3-030-15636-7_8 Maaß, K. (2006). What are modelling competencies? ZDM – Mathematics Education, 38(2), 113–142. https://doi.org/10.1007/BF02655885 Maaß, K. (2010). Classification Scheme for Modelling Tasks. Journal Für Mathematik-Didaktik, 31(2), 285– 311. https://doi.org/10.1007/s13138-010-0010-2 Meyer, W. J. (1984). Concepts of mathematical modeling. Dover. Niss, M., Blum, W., & Galbraith, P. (2007). Introduction. In W. Blum, P. L. Galbraith, H.-W. Henn, & M. Niss (Eds), Modelling and Applications in Mathematics Education: The 14th ICMI Study (pp. 3–32). Springer. https://doi.org/10.1007/978-0-387-29822-1_1 Niss, M., & Højgaard, T. (2019). Mathematical competencies revisited. Educational Studies in Mathematics, 102(1), 9–28. https://doi.org/10.1007/s10649-019-09903-9 OECD. (2023). PISA 2022 Assessment and Analytical Framework. OECD Publishing. https://doi.org/10.1787/dfe0bf9c-en Schindler, M., Simon, A. L., Baumanns, L., & Lilienthal, A. J. (2025). Eye-tracking research in mathematics and statistics education: Recent developments and future trends. A systematic literature review. ZDM – Mathematics Education, 57, 727–743. https://doi.org/10.1007/s11858-025-01699-8 Siller, H.-S., & Greefrath, G. (2020). Modelling Tasks in Central Examinations Based on the Example of Austria. In G. A. Stillman, G. Kaiser, & C. E. Lampen (Eds), Mathematical Modelling Education and Sense-making (pp. 383–392). Springer International Publishing. https://doi.org/10.1007/978-3- 030-37673-4_33 Verschaffel, L., Schukajlow, S., Star, J., & Van Dooren, W. (2020). Word problems in mathematics education: A survey. ZDM – Mathematics Education, 52(1), 1–16. https://doi.org/10.1007/s11858- 020-01130-4 Weinert, F. E. (2001). Concept of competence: A conceptual clarification. In D. S. Rychen & L. H. Salganik (Eds), Defining and selecting key competencies (pp. 45–65). Hogrefe & Huber. Wirth, L., & Greefrath, G. (2024). Working with an instructional video on mathematical modeling: Upper- secondary students’ perceived advantages and challenges. ZDM – Mathematics Education, 56(4), 573–587. https://doi.org/10.1007/s11858-024-01546-2