Assessing primary science learning: beyond paper and pencil assessment

The assessment strategies that are discussed involve more of the continual assessment strategies to allow teachers to understand the progress of their pupils. These strategies have common features that differ from those of traditional strategies. First, they are less judgmental, and are more descriptive in the information that they provide to both teachers and learners on avenues for improvement. Second, they are not concerned solely with correct or incorrect answers, but emphasize more on how well pupils perform. These strategies provide a general picture of what pupils understand, what they are able to do, and how they apply the knowledge that they have learned.

There are a variety of strategies and opportunities for teachers to choose from in the measuring the progress of different aspects of the science learning of individual pupils, some of which are more appropriate than others, depending on the area of science that is being covered and the age range of the pupils (Hollins & Whitby, 1998). The assessment strategies that are available to assess the science learning of primary pupils include performance-based assessment in science projects and investigations, science journal writing, concept maps, portfolios, and questions and answers. Hughes and Wade (1996) suggest that it is important that a variety of methods should be used, because pupils may demonstrate their abilities differently with different approaches. For example, some pupils may perform better in "public" tasks such as oral discussion, and others may do better in "private" tasks, such as writing.

The message that science is not only a body of knowledge but also a way of working seems to have reached teachers, but has not yet trickled down to pupils (Goldsworthy & Feasey, 1994). Although the processes of science are stressed, the continuous emphasis on subject knowledge in assessment has not allowed pupils to grasp the equal importance of science knowledge and science processes.

Science investigations and projects require pupils to explore science issues that they are interested in, or to apply science knowledge in designing things or finding ways to solve problems in everyday life. Diffily (2001) suggests that any science topic can become the focus for an investigation or a project. Any group of elementary pupils can learn to come to a consensus about a topic to study, conduct research, make day to day decisions about locating resources, organize what is being learned, and select a way of sharing with others what they have learned. Farmery (1999) recommends that investigations should be chosen carefully for primary school pupils. They should be adequately resourced, be easily adaptable, and be relevant to the curriculum so that they are assessable.

So and Cheng (2001) find that the multiple intelligences of Hong Kong primary pupils are developed through science projects. Active participation in science projects can help to sharpen the observation and thinking skills of pupils, cultivate their creativity, strengthen their exploration and analytical skills, facilitate their understanding of the relationship between science, technology, and society, and promote their desire to invent and explore.

Reinhartz and Beach (1997) suggest that it is often helpful to develop a set of criteria, or a grading rubric, for the evaluation of the responses and performance of pupils with performance-based assessment tools. The two performance-based assessment rubrics that are suggested in Demers' (2000) article are here merged to provide a clear picture of how the progress of pupils in observation skills, classification, and other areas of performance might be assessed (Table 1).

Level of performance	Process of inquiry	Evidence of inquiry	Depth of understanding	Communication	Presentation
High Low	Observations show evidence of careful study using multiple senses when appropriate. Descriptions contain intricate details. Classification systems clearly reflect careful observations made. System includes all samples provided	Questions are clearly identified and formulated in a manner that can be researched. Evidence and explanations have a clear and logical relationship. Methods of study generate valid data that addresses the question. Variables are controlled. Conclusions are based upon results and clearly explained.	Scientific information and ideas are accurate and thoughtfully explained. Patterns and trends are identified, discussed, and extended through interpolation and extrapolation.	Scientific information is communicated clearly and precisely and may include expressive dimensions. Presentation is effectively focused and organized.	Sentences are both complex and grammatically correct. Core words are spelled correctly. Punctuation is used appropriately. Script is neat and easy to read.
	Observations show evidence of careful study but are relegated to one sense. Descriptions are clear enough for samples to be accurately identified by another scientist. Classification systems are based upon the observations made.	Questions are clearly identified. Evidence and explanations have a logical relationship. Methods of study generate data that is related to the question. Variables are controlled. Conclusions are based on results.	Scientific information and ideas are accurate. Patterns and trends are identified.	Scientific information is communicated clearly. Presentation is focused and organized.	Sentences are grammatically correct. Most of the words, including the core words, are spelled correctly. Punctuation is used appropriately. Script is easy to read.
	Observations reflect the obvious characteristics of samples provided. Descriptions lack intricate detail. Classifications do not necessarily reflect observations made, and may not include all samples provided.	Questions are implied. Evidence and explanations have an implied relationship. Methods generate data related to the question. Variables are not controlled. Conclusions are related to the data.	Scientific information has occasional inaccuracies or is simplified. Patterns and trends are suggested or implied.	Scientific information has some clarity. Presentation has some organization and focus.	Sentences make sense but may contain grammatical errors. Text includes frequent spelling and punctuation errors. Script is legible.
	Observations lack clarity and detail, and are not clear enough to be interpreted by another scientist. Classification system is not based on observable characteristics and does not include all of the samples provided.	Questions are unclear or absent. Evidence and explanations have no relationship. Methods generate questionable data. Variables are not controlled. Conclusions are unclear or unrelated to the data.	Scientific information has major inaccuracies or is overly simplified. Patterns and trends are unclear or inaccurate.	Scientific information is unclear. Presentation lacks focus and organization.	The number of incomplete sentences and grammatical errors render the text difficult to interpret. Spelling and punctuation errors are prevalent. Script is illegible.

Teaching for progression in experimental and investigative science is very difficult (Crossland, 1998). Crossland attempts to show how an aide-mémoire, laid out on one side of A4 paper, helps teachers to focus their short-term planning in terms of the curriculum and formative assessment. At the same time, he also shows the "pupil contribution" component, which provides very useful guidelines for the assessment of the progression of pupils in experimental and investigative science (Table 2). In addition, Farmery (1999) explains the development of a model for ensuring progression in experimental and investigative science. Table 3 shows an extract from this model that demonstrates the possible progression in "obtaining evidence" in experimental and investigative science.

Level	Pupil Contribution
Level 1 Observe using senses, talk, and draw	Help the teacher... Observe... Describe... Talk about...
Level 2 Make comparisons between observations and expectations	Respond to suggestions and, with help, make their own... Use the simple equipment provided ... Describe and compare ... Record (in simple tables)... Compare results with expectations...
Level 3 With some help, carry out a fair test	Respond to suggestions and make their own ... Simple predictions that can be tested... Measure using a range of equipment... With some help, carry out a fair test... Record in a variety of ways... Explain observations and any patterns arising out of their results ... Say what they found out...
Level 4 Recognize the need to carry out a fair test through description or action	Make predictions with a reason based on similar experience ... Recognize the need for a fair test by descriptions or actions ... Select equipment ... Make a series of observations/measurements adequate for the task... Present findings clearly in tables and bar charts... With help, plot graphs to find patterns and to relate conclusions to scientific knowledge and understanding...
Level 5 Carry out a scientific test in simple contexts involving only a few factors	Identify the crucial factors... Prediction based on scientific knowledge and understanding... Select and use apparatus with care... Measure and record with care... Ongoing interpretation of the results... Present data as line graphs... Draw conclusions that are consistent with the evidence... Begin to relate conclusions to scientific knowledge and understanding...

Table 2: Pupils' contribution in experimental and investigative science (Source: Crossland, 1998, p.19)

Obtaining Evidence
Level 1	Level 2	Level 3	Level 4
Children use familiar equipment independently, e.g., weighing scales, ruler. They use unfamiliar equipment with support.	Children use a range of equipment independently, e.g., ruler, tape, weighing scales, thermometer.	Children use a wide range of equipment independently and accurately.	Children use a range of equipment, including instruments with fine divisions.
Children make observations of at least one event within each part of the investigation.	Children understand the need for detailed observations.	Children begin to realize the need to make a series of observations.	Children make a series of relevant and detailed observations
Children make measurements with some degree of accuracy.	Children identify and take relevant measurements.	Children recognize the need to make a series of measurements.	Children make a series of measurements with increasing precision.
Children occasionally repeat measurements to check if they have the same results as someone else.	Children recognize the need to repeat measurements to check accuracy.	Children repeat measurements to check accuracy.	Children repeat observations and measurements and begin to offer simple explanations for any differences recorded.

Table 3: Pupils' progression in "obtaining evidence" in experimental and investigative science (Source: Farmery, 1999, p.14)

Pupils record procedures and results from investigations and observations, hypotheses, and inferences about science phenomena (Lowery, 2000). Free writing and drawing can also be used when the concept area involves possible long-term changes, and pupils should make regular observations (Hollins and Whitby, 1998). By creating journals, pupils are able to depict their way of seeing and understanding phenomena through their own lens of experience (Shepardson, 1997). The value of drawing and writing science lies in its potential to assist pupils to make observations, remember events, and communicate understandings (Shepardson & Britsch, 2000). Hollins and Whitby (1998) find that drawings and diagrams in response to a particular question are particularly revealing and informative when pupils add their own words to them, that is, annotations can help to clarify the ideas that a drawing represents.

Science journal writing with writing or drawings captures a dimension of conceptual understanding that is different from other types of assessment. Science journals can serve as diagnostic tools for informing practice, because they convey the understanding of pupils and so provide a window through which to view this understanding (Doris, 1991).

Shepardson and Britsch (1997) examine the ways in which science journals serve as a tool for teaching, learning, and assessment. They also discuss what science journals can say about what pupils are learning (Shepardson & Britsch, 2000). However, Shepardson and Britsch find that journals that were written by pupils on the topic of mixing and separating five different materials - clay, silt, sand, pebbles and gravel - give no indication that they have understood why sand and pebbles could be mixed and separated, and only show that it happened. Thus, journal writing might only indicate that pupils have learned the activity but not that they have learned the science. Therefore, Shepardson and Britsch remind teachers to employ multiple modes of assessment.

The ways that are suggested by Shepardson and Britsch (2000) to assess journals are simplified to help teachers to use journals as a meaningful tool for the assessment of the science learning of pupils.

The assessment logs in Table 4, which have been adopted and modified from Shepardson and Britsch (2000), can be used by teachers to monitor the performance of pupils in journal writing and drawing skills.

Table 4: Assessment logs to monitor the performance of pupils in journal writing and drawing skills (Source: Adopted and modified from Shepardson & Britsch, 2000, p.32)

The use of concept maps in teaching and learning was initiated and developed by Novak and Gowin (1984). Concept maps measure or reflect more complex levels of thinking in the same way that science journals, science projects, science investigations, and other performance-based assessment methods do. In comparison with other assessment methods, however, concept maps are quicker, more direct, and considerably less verbal than essays or other types of written work. The visual nature of concept maps helps pupils to organize their conceptual framework (Willerman & MacHarg, 1991). White and Gunstone (1992) note that concept maps portray a great deal of information about the quality of learning and the effectiveness of the teaching. Stow (1997) states that concept mapping is a useful tool to help pupils to learn about the structure of knowledge and the process of knowledge production or meta-knowledge.

The use of concept mapping as an elicitation and assessment tool has been widely discussed (Atkinson & Bannister, 1998). Concept mapping has been shown to allow links to be made between concepts, and thus reveals scientifically correct propositions and misconceptions. The concept maps that are devised by pupils reflect their own ideas and understanding, and so cannot be marked wrong or right (Comber & Johnson, 1995), even if their ideas do not match with what is regarded as scientifically correct. Atkinson and Bannister (1998) have discovered that concept mapping can be a useful assessment tool, even with very young children.

By looking at the maps that were drawn by the pupils in Stow's article (1997), it is possible to see how the understanding of mapping and of the water cycle topic that is the subject of the maps has developed in the pupils. One pupil drew a fairly well connected map before the investigations (Figure 2) that seems to show a mixed understanding of the concepts involved. After the investigations, the same pupil’s map is significantly more sophisticated (Figure 3), and shows a far greater range of connections and a greater understanding of the grammar that is needed to complete the connections. It demonstrates a clearer understanding of the concepts that are involved; for example, evaporation is linked to condensation, and also to the sun. The motivational benefits of the comparison of the two maps and the pupil's self-evaluation of their progress are clear. The opportunity that concept mapping provides for pupils to examine the progress of their own learning is instrumental in the encouragement of meaningful learning. The mapping and subsequent evaluation provides a framework of reference within which pupils can analyze their own thinking, which enables them to identify their strengths and weakness and set themselves future learning targets.

Figure 2: A pupil's concept map before carrying out the activity (Stow, 1997, p.13)

Figure 3: A pupil's concept map after carrying out the activity (Stow, 1997, p.13)

Concept maps serve both formative and summative purposes in the assessment of student science learning. Over the past twenty-five years, concept maps have been adopted by thousands of teachers in elementary and secondary schools (Edmondson, 1999). The following are comments from science educators on the advantages of using concept maps as assessment tools.

Novak and Gowin (1984) suggest that teachers could construct a "criterion map" against which the maps of pupils could be compared, and the degree of similarity between the maps could then be given a percentage score. However, White and Gunstone (1992) argued that though scoring is not helpful for formative assessment, scoring becomes more sensible when concept maps are used in summative assessment. There are various other schemes for scoring concept maps, but most of them are variations of the scheme that is outlined by Novak and Gowin. Markham, Mintzes, and Jones (1994) modified Novak and Gowin’s scheme to include three more observed aspects of concept maps for scoring: the number of concepts, which is evidence of the extent of domain knowledge; concept relationships, which provide additional evidence of the extent of domain knowledge; and branching, which provides evidence of progressive differentiation. Table 5 shows a summary of the schemes that are suggested by Novak and Gowin (1984) and Markham, Mintzes and Jones (1994).

Novak and Gowin (1984)		Markham, Mintzes & Jones (1994)
Criteria for evaluating concept maps	Scoring	Criteria for evaluating concept maps	Scoring
Validity of relationship	1 point for valid relationship	Concept relationships that provide additional evidence of the extent of domain knowledge	1 point for each valid relationship
Levels of hierarchy	4 points for hierarchy	Hierarchies that provide evidence of knowledge	5 points for each level of hierarchy
Validity of the propositions and cross-links	10 points for each cross-link	Cross-links that represent evidence of knowledge integration	Each cross-link receives 10 points
Use of examples	1 point for each example	Examples that indicate the specificity of domain knowledge	Each example receives 1 poin
		Number of concepts as evidence of the extent of domain knowledge	1 point for each concept
		Branching, which provides evidence of progressive differentiation	1 point for each branching, 3 points for each successive branching

There are other suggestions for the scoring of concept maps. Trowbridge and Wandersee (1996) suggest a concept map "performance index," which they describe as a compound measure that includes the pupil’s concept map scores, the difficulty level of each map produced, and the total number of maps submitted. Rice, Ryan, and Samson (1998) developed a method of scoring concept maps that is based on the correctness of the propositions that are outlined in a table of specifications of instructional and curriculum goals. They find high correlations between concept map scores and scores in multiple-choice tests that are aimed at assessing the same instructional objectives. Edmondson (1999) suggests that the scores for particular attributes of concept maps could be used as a basis for a comparison of the extent to which different dimensions of understanding have been achieved. The purpose of such assessment is for teachers to make adequate provision for pupils' learning to further develop their understanding.

Although there are many suggestions for the scoring of concept maps, there are also criticisms of these scoring systems. Regis, Albertazzi, and Roletto (1996) therefore suggest a shift in emphasis and focus toward the assessment of changes in the content and organization of concept maps over time.

Spandel (1997) asserts that any collection of student work, which includes tests, homework, and laboratory reports, can be included in a portfolio as representative samples of student understanding. Portfolios provide examples of individual student work, and can indicate progress, improvement, accomplishment, or special challenges (Lowery, 2000). Portfolios should be a collection of many meaningful types of materials that provide tangible proof of the progress of a pupil (Reinhartz & Beach, 1997). As part of the portfolio exercise, Buck (2000) has pupils pick out their best work from a unit and describe what the pieces of work reveal about what they have learned. Vitale and Romance (2000) focus on the value of portfolios as measures of understanding in natural science, and further suggest that portfolios might be defined as collections of student work samples that are assumed to reflect the meaningful understanding of the underlying science concepts (Vitale & Romance, 2000). They highlight that portfolio activities and tasks are open-ended, and constructively require pupils to use and apply knowledge in ways that demonstrate their understanding of science concepts.

Portfolios are one of the assessment measures that were recommended in the recent curriculum reform in Hong Kong: "The portfolio is used to contain students?evidence of learning. During the processes, pupils make their own judgment and select the artifacts (observation sheets, questionnaire and interview results, art produced, etc.) that best meet the criteria for excellence and personal improvement?(Curriculum Development Council, 2000, p.16). As a form of authentic assessment, portfolios are considered by their advocates to offer teachers a more valid means of evaluating student understanding than traditional forms of testing (Jorgenson, 1996).

Within science classrooms, a wide range of products can be included as work examples in student portfolios. The emphasis should be on products that reflect the meaningful understanding, integration, and application of science concepts and principles (Raizen, Baron, Champagne, Haertel, Mullis & Oakes, 1990). These include reports of empirical research, analyses of societal issues from a sound scientific view, papers that demonstrate an in-depth understanding of fundamental science principles, the documentation of presentations that are designed to foster the understanding of science concepts for others, journals that address a pupil’s reflective observations over an instructional time span, and analytic or integrative visual representations of science knowledge itself in the form of concept maps.

Vitale and Romance (2000) suggest the development of guidelines for the evaluation of portfolio assessment products. The evaluation of the portfolio by the teacher should be a clinical judgment with two considerations. The first is the degree to which the relevant conceptual knowledge is represented accurately in the portfolio product, and the second is the degree to which the portfolio product meets the specified performance outcomes, which include the degree to which the relevant concepts are used on an explanatory or interpretative basis by pupils. Thus, there is no need to develop numbered scoring systems or rubrics, because they are not specific enough to provide evidence of meaningful student learning.

Open-ended questions mimic good classroom strategies and encourage thinking (Lowery, 2000), both of which are helpful to teachers in understanding how pupils go about finding an answer, solving a problem, or drawing a conclusion. Hughes and Wades (1996) also suggest that both open-ended and closed questions might be asked to gain information about pupils?investigational abilities. Some examples of open questions are:

Hughes and Wades (1996) acknowledge the flexible nature of one to one or group questioning. These techniques enable supplementary questions to be asked to clarify what was really meant by a child’s vaguely worded response or to verify whether omitted details from a written account were due to forgetfulness, laziness, or a lack of understanding and ability.

However, Black and Wiliam (1998) opine that the dialogue between teacher and pupils that arises when the teacher asks questions is unproductive in the assessment of learning. There may be a lowering of the level of question to facts that require very little thinking time, and the dialogue only ever involves a few pupils in the class. To enable thoughtful, reflective, and focused dialogue between teacher and pupils to evoke and explore understanding, Black and Wiliam (1998) suggest that teachers should:

In addition to questions from teachers, Watts, Barber, and Alsop (1997) assert that the questions of pupils can be very revealing about the way that they think, their worries and concerns, what they want to know and when they want to know it. Gibson (1998) uses a similar technique in his study, but with an emphasis on the answers of pupils to their own questions. The process of asking questions is emphasized, and the construction of meaning should be continuous. Asking pupils to generate questions on a regular basis also shows their development, as the questions really start to probe the big issues, or narrow the topic down to very specific queries. Gibson shows a sample of the range of pupils' answers in her article. The small selection of pupils' explanations clearly shows their thinking, and is possibly even more revealing than their questions. Gibson states that the answers that pupils give to their own questions can be a valuable learning and assessment tool. Although some of the pupils in his study showed a shallower understanding than others, the answers of all of the pupils give an insight into how they are developing. This "Any answer" session that is suggested by Gibson can follow on from sessions that are designed to generate questions, either before or after practical investigations, and will reveal the thinking of the pupils and their ability to make a hypothesis.

Assessment logs	Pupil performance
Representing science activity/understanding science	Activity/science understanding
Content understanding/science processes	Content/processes
Drawing/writing	Drawing/writing/drawing and writing
Graphic context	Labeling/immediate observation
Grammatical complexity of the writing	Description/analogy/explanation