Explicit Teaching and Scaffolding to Enhance Concept Learning by Design Challenges

: Th is paper presents a mixed methods study, carried out among 21 fi rst-year student teachers, that investigated learning outcomes of a modi fi ed Learning by Design (LBD) task. Th e study is part of a series of studies that aims to improve learning, teaching and teacher training. Design-based science challenges are reasonably successful project-based approaches for breaking down the boundaries between traditional STEM subjects. Previous learning outcomes of the extensively studied LBD approach demonstrated a strong positive e ff ect on students’ skills. However, compared to traditional classroom settings, LBD provided little pro fi t on (scienti fi c) concept learning. For this, according to two preliminary studies, a lack of explicit teaching and sca ff olding strategies, both strongly teacher-dependent, bears a share of responsibility. Th e results of the third study discussed in this paper indicate that emphasizing these strategies strengthens concept learning without reducing positive e ff ects on skill performances.

LBD and desing composition of the challenge used for this study LBD contains, as shown in Figure 1, two essential components for learning skills, practices and content: (re)design and investigation. Within these components are a variety of reflective hands-on and headson activities concerning design technology, science practices, public presentation, collaboration and teacherguided class discussions. Students (operating in design groups) are faced with a design challenge where they first have to explore and establish things they need to know/learn for succeeding. By information seeking and experimentation they find answers to questions raised in order to apply them in the design  Investigation of this application may lead to additional questions and reinvestigation. To incentivize the understanding of design-related principles and concepts, teacher-guided activities take place (poster and pin-up sessions, whole-class discussions and gallery walks). During these activities experiences and insights are shared among groups, feedback is being given and science is being made explicit. In general, LBD provides a constructivist learning environment where students experience the necessity to learn (Kolodner, Hmelo, & Narayanan, 1996) driven by the fact that students' pre-task conceptions are not sufficient for succeeding: design challenges deliberately address cognitive conflicts. A more scientific framework of knowledge is necessary to cope conflicts and reach conceptual change (Abdul Gafoor & Akhilesh, 2013;Cobern, 1994). Based on literature, e.g. Brandsford, Brown, Donovan, and Pellegrino (2003), LBD contains several elements that are beneficial to conceptual change: collaboration, reflection, contextual learning, applying what is learned, learning from failures and iteration, and connecting skills, practices and concepts.
For this study, an existing LBD challenge was modified for better concept learning. The challenge originated from Study 2 that also concerned first-year student teachers (science). Design groups (3 students per group, randomly chosen) were challenged to design a solar power system for a model house, shown in Figure 2, by taking into account the design specifications in Table 1 that headed for using underlying science, decisionmaking and creative thinking. Regarding these specifications and the scientific objectives in Table 2 the most fundamental design principles concerned proper wiring (combining series and parallel parts) and regulating current, voltage and resistance for maximum efficiency. Table 3 shows how the LBD elements in Figure 1 were applied resulting in an activity, guided through an instructive presentation and a student's and teacher's guide, that took six periods of 90 minutes.
van Breukelen et al. Table 1 Design specifications and components

B. [Solar powering]
The entire lightning has to be connected to a separate (combination of) solar cell(s). The same applies to the doorbell and washer-dryer combination. C.
[Efficiency] The energy efficiency of the entire wiring has to be as high as possible and, in addition, the use of materials as less as possible. In any case, it is not allowed to use more components than available.  Van Meel, et al. (2016). Table 2 Scientific objectives and interrelatedness with the challenge DC electric circuits objectives Interrelatedness 1. Physical aspects of electric circuits: resistance is a property of an object and hinders current flow (Ohm's Law); equivalent resistance in series increases and in parallel decreases as more elements are added; the necessity of a closed circuit to enable current flow; interpret pictures, diagrams, symbols of a variety of circuits.
• Resistors are necessary to reduce current flow and a variable resistor is necessary to adjust the washing machine's speed. Furthermore, students have to interpret and design a variety of circuit parts in order to meet the requested wiring.
2. Energy and power: apply the concepts of energy (dissipation, conversion and conservation) and power (work done per unit time) to a variety of circuits.
• Students have to establish the amount of energy supply and consumption by the designed circuit in order to reach maximum efficiency.
3. Current: understand and apply conservation of current (Kirchhoff 's point rule) to a variety of circuits; explaining the behavior of an ideal current source.
• Combining series and parallel parts (solar cells and components) to meet design specification forces students to investigate and calculate current flow and potential differences. Furthermore, students have to investigate the behavior of (combined) solar cells to get informed about differences compared to (well-known) voltages sources. 4. Potential difference, voltage: the amount of current is influenced by potential difference; apply the concept of Kirchhoff 's loop rule (åV = 0 around a closed loop); explaining the behavior of an ideal voltage source. Note. Reprinted from Van Breukelen, Van Meel, et al. (2016).
van Breukelen et al.

Preliminary studies and task adjustments
The challenge discussed in the previous section was adjusted towards better concept learning for use in this study. As discussed the adjustments listed below mainly concerned explicit teaching and scaffolding strategies.
Backward design. The pre-and post-exam outcomes of the preliminary studies revealed that high gain question were strongly task-related and crucial for succeeding. Thus, detailed task analysis is important to unravel task-exposed and -underexposed concepts and to predict learning outcomes. Additional less directive concepts, complementing the knowledge domain, should be addressed otherwise (teacher-driven) through additional teaching interventions. This approach corresponds to the idea of backward design (Wiggins & McTighe, 2006) that states that education designers must begin to think about assessment and objectives before van Breukelen et al. deciding what to do and how to teach. Regarding the solar challenge, initially designed for Study 2, there were four topics of underexposed science: (1) conceptual nature of resistance, (2) nature of electrical energy and energy dissipation, (3) behavior of current in components, (4) effect of voltage changes on circuit operation. To explore 1 through 3 students used simulation software and the fourth topic was addressed by additional experimentation. All topics, addressed as interludes, were complemented by class discussions and didactic analogies for clarification. Topic 1 and 3 were addressed after experimentation (stage 3) because experimentation contained resistance-current measurements. Topic 2 was addressed after design testing (stage 6) because then efficiency calculations took place. Topic 4 was addressed after the final stage by replacing the solar cells in the final design with a traditional voltage source.
Guided discussion. For teacher guidance during class discussion the technique of guided discussion (Carpenter, Fennema, & Franke, 1996) was used to highlight and explicate underlying science. When students worked in groups they were challenged to think and make sense of what they were doing. Then, by observing students' thinking and doing it became clear what individual students understood about science. Based on this, the teacher made notes about which students should present their insights during class-discussion. This might concern insights that are incorrect but useful to initiate a discussion of common misconceptions. Eventually, more sophisticated insights are used as input to head for proper reasoning and understanding. Both inputs and class discussion provide the teacher with information about students' knowledge and (existing) cognitive gaps, whereupon better understanding can be obtained.
Informed design. Informed design aims for enhancing students' prior knowledge through preparatory activities named knowledge and skill builders (KSBs) (Burghardt & Hacker, 2004). In this way, students are better prepared to approach design challenges from a more knowledgeable base and to tackle design problems by conceptual closure. Based on Study 2 an preparatory activity was created for this study around the behavior of solar cells. Students involved in Study 2 incorrectly assumed, without testing, that collar cells behave like (ideal) voltage sources. This assumption resulted in insignificant and time-consuming experimentation and finally trial-and-error behavior during design planning. To prevent this from happening students had to do, during stage 2, some information seeking accompanied by a class discussion regarding characteristics of (combined) solar cells.
Explicit instruction and scaffolding. According to Archer and Hughes (2011) explicit instruction is characterized by a series of supports or scaffolds, where students are guided through the learning process with clear statements about the purpose of and rationale for learning activities. It embraces 16 instructional elements that aim for a systematic method of teaching with emphasis on proceeding in small steps, checking for student understanding, and achieving active and successful participation by all students. LBD takes account of most of the elements and the adjustments mentioned before also fit into explicit instruction. However, teacher guidance should also fit the educational setting. Design challenges face teachers with a new kind of classroom control (Wendell, 2008) where teachers must relinquish directive control (Burghardt & Hacker, 2004). Thus, teachers need to develop pedagogical strategies for guiding complex design-based science tasks (Bamberger & Cahill, 2013). Study 2 that investigated these strategies resulted in a framework of important teaching guidelines that were directive for teacher handling during this study.
Adjustment of the design diary. Students involving Study 1 and 2 hackled the amount of administration (design diary) mainly because little administration was necessary to learn or move on. For example, there was a lot of requested reflection but in too few occasions this affected advancement directly. As a result reflection became disturbing and abortive. Therefore, administration was reduced and many written proceedings were replaced by process pictures accompanied by short subscriptions.
Methodology 21 first-year student teachers (science) took part in this design-based mixed methods study where they faced the improved solar challenge. All participants had prior experiences on characteristic LBD elements and van Breukelen et al. sufficient prior knowledge regarding the science domain. The challenge was guided by the principal investigator (teacher trainer) because of the relatively small number of participants. According to Crouch and McKenzie (2006) this offers the possibility to establish a sustainable relationship with participants and to provide added depth to the study, all resulting in an increased validity.
Quantitative data was collected to study students' progress in concept learning and video recordings were used to generate quantitative data about skill performances. Qualitative data was used to discover how task improvements affected concept learning by comparing students' comments to previous studies.

Data collection
To study a change in conceptual understanding the pre-post-exam developed for Study 2 was used. This multiple choice test is based on the validated Determining and Interpreting Resistance Electric Circuit Test (DIRECT) specially designed for use with high school and university students (Engelhardt & Beichner, 2004). The exam contains 46 items where each objective in Table 2 is served by multiple questions. Study 2 showed, by using a control group, that there was no task learning effect from just completing the test.
Study 1 showed that students mainly learned incomplete concepts and had difficulty in making proper knowledge connections and therefore did not achieve deeper conceptual understanding. This conclusion was partially based on multiple choice tests. According to Stoddart, Abrams, Gasper, and Canaday (2010) closeended tests like this often fail to measure conceptual understanding because students easily can make guesses and therefore knowledge structures remain invisible. Using concept maps is suggested as a more meaningful way of assessing conceptual understanding. Therefore, beside multiple choice testing, students were asked to create a concept map before and after the challenge. For this, a proposition-based concept map test was developed, based on Yin, Vanides, Ruiz-Primo, Ayala, and Shavelson (2005), where students had to create 16 fundamental propositions (a connection between two concepts by using linking words or phrases) within a set of 10 predefined task-related concepts. The selection of concepts and the amount of propositions was based on a peer reviewed expert map ( Figure 3) designed by two experts. According to literature, proposition-based concept map tests, based on an expert map, appear to be superior to other mapping strategies in assessing conceptual understanding (Cañas et al., 2003;Rye & Rubba, 2002). It is important to note that Yin et al. (2005) established small task learning effects in some cases due to the development of mapping skills. Those effects will be minimal for this study because students are familiar with mapping techniques. To study an increase in students' skill performances we chose and slightly adapted the approach used in previous LBD research by Holbrook, Gray, Fasse, Camp, and Kolodner (2001) in order to make comparison possible. Students were videotaped when working, partially in groups, on similar performance tasks before and after the challenge. Tasks were taken from the Performance Assessments Links in Science Website database (SRI International Center for Technology in Learning, 1999) and were suitable for use with senior high school students (age 16-18); comparable to first-year student teachers in this study. During the pre-task students had to determine the power dissipated in a combination of two resistors connected in series to a battery. The post-task concerned the determination of how well different wires radiate heat when voltage is applied across each wire. Both tasks included three parts: (a) students designed an experiment or procedure for fair testing, (b) students ran an specified experiment and collected data, and (c) students analyzed the data to draw conclusions and make recommendations. The videotapes were analyzed, also according to Holbrook et al. (2001), on seven sciencerelated dimensions (Table 4): negotiations during collaboration, distribution of efforts and tasks, attempted use of prior knowledge, adequacy of prior knowledge, scientific reasoning, experimentation skills and self-checks. Because the dimensions contain a mix of individual and collaboration skills each activity (a-c) started with an individual preparation, followed by a sharing session and ended with task completion by teamwork.
Afterwards an open-ended questionnaire was used to investigate students' views on which activities stimulated or impeded concept learning. Questions were based on the STARR method that provides a framework for reflection on learning outcomes (Verhagen, 2011). By interpreting students' answers, also in the light of preliminary studies, it is possible to establish whether the improvements are appreciated or room for improvement is left. Open-ended questions were used to prevent students' views from being swayed by possible answers. To confirm that the questionnaires' data reduction and interpretations were fair (respondent validation) nine students, the number that made themselves available, were interviewed simultaneously. During this session also some remarkable differences and correlations regarding learning outcomes were discussed for deeper understanding.

Analysis
The multiple choice tests were scored per student by the proportion of correct answers among 46 items. Proportions were used to calculate the gain (g): ratio of actual average gain (post -pre) to the maximum possible average gain (1 -pre) (Hake, 1998). A paired samples t test and a Wilcoxon signed-rank test were performed to investigate differences between pre-and post-scores. This combination was used because literature indicates that for relative small sample sizes using both tests increases the possibility to detect type I and II errors (Meek, Ozgur, & Dunning, 2007). Establishing Cronbach's alpha revealed the internal consistency of the exam.
The concept maps were scored per student. For this, all propositions (16 per concept map) were rated by two experts individually. Based on Yin et al. (2005) and Rye and Rubba (2002) the following scores were awarded: 3 points for a scientifically correct expert proposition (analogous to the expert map in Figure 4), 2 points for other correct propositions, 1 point for a weak or partially correct proposition and no points for incorrect propositions. Based on the experts' allocated scores the linear weighted Cohen's Kappa was calculated, which was sufficient. Then, the experts' average scores were assigned as final scores. Finally, the proportion scores (based on a 48 maximum score) were used, similar to multiple choice test analysis, to calculate gains and to investigate pre-post-score differences.
Analysis of the videotaped performance assessments took place by using a scoring rubric (See Appendix) where each performance dimension was served by a 5-point rating scale (1-5), with 5 being the highest level/score. Although the rubric's scale and dimensions are similar to that used by Holbrook et al. (2001) the level descriptors were adjusted for more validity. The original rubrics assessed skill performances by capturing the extent to which students in a group participated in practicing a skill: if more students were actively involved the group got a higher rating. According to Jonsson and Svingby (2007) this (possibly) causes validity problems because this method fails to reveal the quality of students' individual performances. Because a well-validated rubric, matching all our skill dimensions, was not available a rubric was created by combining existing rubrics. For this, we used an available qualitative framework of criteria to guaranty an acceptable level of validity, because a more sophisticated approach, achievable within this study, is still in its infancy (Baartman, Bastiaens, Kirschner, & Van der Vleuten, 2006;Moskal & Leydens, 2000). In short, rubrics compromising the following properties were selected: applicability to a 5-point scale, level descriptors based on observable behavior of individuals, univocal descriptors that actually reflect the skill dimension, some degree of validation, development based on experiences, expert involvement and suitability for the target group. Based on the final rubric two experts rated students' skill competences individually whereupon, after establishing an acceptable Cohen's Kappa, the experts' average ratings were assigned as final scores. Differences between pre-and postratings were also tackled by paired samples t tests and Wilcoxon signed-rank tests.
To investigate the strength of the relationships between pre-and post-assessment variables the Pearson product-moment correlation coefficient was computed for all possible combinations of variables. It is particularly interesting to find out how the multiple choice test and concept map test are correlated because they both concern conceptual knowledge. Furthermore, it reveals which skills strongly interact with concept learning.
For questionnaire analysis, at first to categorize responds, we distinguished between positive and negative opinions on the process of conceptual learning. After this, within these categories common themes were grouped and tagged by a description resulting in sub-categories of impeding or stimulating properties. Finally, all questionnaires were re-read to make sure all responses were categorized properly. During the group interview all properties were discussed and, based on students' input, slightly customized or filled up. Finally, remarkable differences and correlations regarding assessment outcomes were accompanied by a uniform group opinion on how to interpret results. For theoretical underpinning of students' opinions scientific literature was searched through. Table 4 gives a complete overview of all pre-and post-assessment results per student including mean scores and standard deviations. For the multiple choice test the average Cronbach's alpha, based on individual objectives in Table 2, is 0.72 for the pre-test and 0.69 for the post-test. The linear weighted Kappa values for the concept map and performance assessment analysis are shown in Table 5. Thus, in case of all assessments the reliability is sufficient.

Results
The conceptual learning gains in Table 4 for the multiple choice test are significant, t(20) = -30.87; p < 0.001, just as for the concept map test, t(20) = -24.58; p < 0.001. This is confirmed by the Wilcoxon signed-rank test that gives the same p value for both tests. The mean gain for the multiple choice test (0.68) significantly increased compared to Study 2 (0.49) and Study 1 (0.35) and exceeds conceptual gains (LBD) found by Holbrook et al. (2001) that revealed mean gains up to 0.40. Compared to a large previous survey of pre-post-test multiple choice data for physics courses (Hake, 1998), that showed maximum gains between 0.60 and 0.70, our students were equally successful. Remarkably, the highest gains found by Hake (1998) resulted from interactive engagement (IE) methods designed, similar to LBD, to promote conceptual understanding through heads-and hands-on activities contributed by (peer) feedback, collaboration and intensive teacher guidance.
Incidentally, a critical comment should be made because the concept map tests showed significantly lower (p < 0.001), but still substantial, gains (mean gain = 0.49; lowest gain = 0.37; highest gain = 0.64). This will be discussed in more detail later on. Table 5 κw for concept map and performance assessment ratings Test kw pre-ratings kw post-ratings Concept map test 0.68 (lower limit = 0.62; upper limit = 0.74) 0.65 (lower limit = 0.58; upper limit = 0.72) Performance assessment 0.62 (lower limit = 0.52; upper limit = 0.72) 0.66 (lower limit = 0.58; upper limit = 0.74) Studying the performance assessment results in Table 4, shown graphically in Figure 4, it indicates that all skill dimensions show an increase in achievement level, where the highest progressions concern the adequacy of prior knowledge, experimentation skills and self-checks. However, all improvements are significant (p < 0.001) and fairly comparable to the performance assessment results found by Kolodner, Camp, et al. (2003). Those results showed scores between 2.00 and 3.00 for honors non-LBD students (the category befitting the students in our study at pre-testing) and scores up to 4.00 for typical LBD students (students exposed to LBD). Overall, students in this study reached, compared to previous LBD studies, much higher conceptual learning gainswhile advancement in skill performances was not hindered.
According to Table 6 there were strong (significant) positive correlations between the pre-scores of the multiple choice and concept map test, as well as for the post-scores. The gains of both tests showed a lower, but fair, correlation (r = 0.683, n = 21, p < 0.01) that can be explained by the fact that the mean gain for the concept map test was significant lower compared to the multiple choice test. According to Constantinou (2004) multiple choice test and concept map test results vary to a greater or lesser extent depending on which kind of learning is assessed through the multiple choice test (e.g. rote learning or meaningful learning). In general, Ruiz-Primo, Schulz, and Shavelson (1997) state that the correlation between both tests should be positive because they measure the same knowledge domain but the magnitude may differ. The interviewed students all agreed that the concept map test was more difficult because it stronger appealed to mastering well organized, relevant knowledge structures. van Breukelen et al. Furthermore, Table 6 shows moderate or strong positive correlations between the conceptual tests and three dimensions of the performance assessment (use and adequacy of prior knowledge and scientific reasoning) that also positively correlated with each other. Other positive or negative correlations between variables were not found or appeared to be weak or occasional. These findings correspond to previous findings: Schreiber, Theyßen, and Schecker (2016) found high correlations between conceptual tests and the preparation and evaluation of experiments, where prior knowledge and scientific reasoning are important, and low correlations with respect to conducting the experiment by following the rules for fair experimentation. Stone (2014) states that general skills, like collaboration and reflection, have a limited interconnectedness with more sciencespecific skills (practices) and the knowledge domain, where Zimmerman (2000) explicitly mentions the weak relation between conducting reception experiments and mastering conceptual knowledge and strong relations between conceptual knowledge, prior knowledge and scientific reasoning. All these insights perfectly reflect our findings where the interviewed students also emphasized the concept-free character (according to science knowledge) of collaboration, reflection and conducting a prescribed experiment. On the other hand students compared the use and adequacy of (prior) knowledge combined in combination with scientific reasoning to the mental activity important for creating a concept map, which reflects the mastering of knowledge structures.  Table 4; PA = performance assessment; *p < 0.05; **p < 0.01 g Table 7 shows the results of the questionnaire analysis where the amount of positive replies largely exceeds the negative ones. According to students, activities that directly appeal to underlying science (explicit teaching and experimentation) are invaluable for concept learning complemented by sufficient teacher and task guidance (feedback, clear instructions and transparency). These results are perfectly consistent with the results of Study 1 and 2. It is, however, surprising that in this study learning form peers is clearly more appreciated. Maybe because this study revealed less trial and error behavior and therefore students acted more like a role model or because the guided discussion approach, where the use of students' insights is directive, clarifies that peers are an important learning source. Although interviewed students seemed to confirm both statements a sloid validation failed to appear. Clear instructions and transparency of tasks and objectives 12 Other (e.g. false information sharing, task duration) 11 Other (e.g. reflection, information seeking, the design context) 11 Note: Perc. = relative distribution (%) of all replies within each category on which the corresponding sub-categories were distracted. Descriptions were redefined based on the group interview.
Taking the impeding factors into account fragmentation is an issue. First, students experienced too little coherence in addressed science and, second, students described the number of stages and accompanying administration as disruptive to the ongoing learning process. Also some students missed assimilation of addressed science for anchoring. Compared to the preliminary studies, the initial problems of addressing an incomplete science domain and a lack of science explication seem to be tackled. However, despite the improvements, coherence still is an issue and the amount of administration is still disruptive.

Discussion and implications
The adjustments, deduced from two preliminary studies, to enhance concept learning by design challenges appear to be successful because this study reveals a solid improvement of conceptual learning gains without reducing a positive effect on skill performances. Especially when the multiple choice test results (the assessment form used in all studies) are taken into account: gain-indices increased from the lower limit of medium (> 0.30) up to the very margin of high (0.70) where the latter is more or less reserved for the most successful physics-related courses (Hake, 1998). Students' responses show, which can be considered as an important reason for improved concept learning, that in contrast with the preliminary studies little comments were made about a lack of (explicit) science teaching. It seems that a combination of backward design, guided discussion and informed design is an appropriate remedy to enhance concept learning by extending strongly task-driven concepts and further deepening of all concepts. This happens, first, by introducing additional teacher-driven concepts (weakly task-driven) to complement the knowledge domain; important for understanding individual concepts. Second, by explicating and deepening all addressed science (explicit teaching). Figure 5 illustrates how contributions to conceptual learning gains may possibly collude where, of course, nearly always room for further improvement is left.  This study provides some interesting clues where to search for further improvement. First, this study reveals significant positive correlations between students' conceptual performances, the use and adequacy of (prior) knowledge and scientific reasoning. Second, although students reached substantial conceptual learning gains the concept map test gains were significantly lower than multiple choice test gains. Third, students compared the use and adequacy of (prior) knowledge in combination with scientific reasoning to the mental activity important for creating concept maps. Fourth, students mentioned the fragmentation of addressed science and a lack of deeper assimilation of addressed science as important shortcomings. Thus, combining all four, more coherence of addressed science may be valuable because mastering explicit interrelationships between domain concepts enhances learning (Brandsford et al., 2003;Wiggins & McTighe, 2006). This may also improve the adequate use of knowledge and scientific reasoning and, with this, meaningful learning (important for concept mapping).
Beside fragmentation of addressed science, according to students, the same comment applies to the task itself. Although students experienced sufficient guidance and task transparency they described the number of stages and accompanying administration as disruptive to the ongoing learning process. Maybe a reduction of the number of (separate) stages and activities, through amalgamation, offers more coherence and less administration where guidance and scaffolding is shifted towards the ongoing process itself rather than breaking down into parts.
To conclude, both aspects of fragmentation, as discussed before, will be a main topic for further research. However, in general this study revealed some more interesting research themes. First, it is interesting to study the interaction between skill and concept learning in more detail because both types of learning are (partly) correlated and may strengthen each other. All the more because learning (STEM) skills is regarded as an important goal for modern education driven by a complex world economy that demands for those skills (ICF and Cedefop for the European Commission, 2015). Second, more insight is needed into the creation, use and validation of rubrics to assess skills. Third, correlations between multiple choice tests and concept map tests are often significant but widely spaced (Constantinou, 2004). Therefore it is necessary investigate this correlation in more detail and to find out how conceptual knowledge can be assessed properly depending upon the learning objectives.