Sameness and Difference in Transfer Ference Marton02

In order to grasp a general principle by handling an instance of that principle,the general principle must be discerned by separating it from the specific instance in which it is embedded.But how can such a separation be brought about? This is actually the classical problem of general qualities and specific instances.The former are what dwell in Plato's world of pure ideas,whereas the latter are what populates our world.How can we tell the two apart? For example,how can we distinguish “threeness” from three apples,the idea of serendipity from a serendipitous event,or as in Gick and Holyoak's (1980) case the idea of simultaneously converging paths from destroying a tumor in the stomach by radiation?

As I have argued,in relation to the Greeno et al.(1993) and Lobato and Siebert (2002) studies,discernment or separation can hardly come about by focusing on one instance only,in which the general and the specific are completely inter-twined;the former implicit,the latter explicit.If there are two sufficiently different instances of the same principle,what is common (the principle) may possibly be discerned from what is different (the instances).The more different cases available to the learner,the greater the likelihood that the principle will be discerned as the primary,or only,thing common to all,because of the likelihood of ruling out what is different (Reeves & Weisberg,1994).The likelihood of distinguishing the principle and its specific instantiation can also be enhanced by drawing learners' attention to the commonalities between the different instances.

The main point is that in order to discern the general principle to be used in the second problem,empirically at least two examples are needed.The traditional idea of transfer-learning something in Situation A (discerning a general principle) and using it in Situation B—is logically untenable.As all the transfer experiments quoted by Lave (1988) demonstrated,the first instance in which learners discern a principle may actually occur when they are dealing with Situation B rather than Situation A.From Lave's point of view,this can be seen as an example of local construction of the solution (i.e.,the solution does not exist prior to the problem being solved).From the point of view of my own line of reasoning,this is an example of the necessity of variation.Without different instances (at least two) the learner is most unlikely to become aware of the general principle.In fact,in a follow-up of their study,Gick and Holyoak (1983) showed that although learners had little success in abstracting (or separating) generalized solutions from single specific instances,they frequently managed to do so when dealing with two different instances.One could say,of course,that several examples make it possible for the learner to discern the general principle because the examples all embody the same general principle.It is true,provided the examples are different.Using the same example twice or several times instead would not do.

Preparation for Future Learning

As stated,this article is intended to challenge the doctrine of sameness,which is the idea that when learners profit in a new situation from what they have learned in an earlier situation,they do so because they make use of the same capability in relation to the same features of the two situations.I am going to use a recent study as the basis for my second argument.This study was chosen because it differs in important respects from the studies discussed so far.The study in question belongs to a set of studies oriented toward the preparation for future learning (e.g.,Bransford & Schwartz,1999;Schwartz & Bransford,1988).In these studies the object of research is the effect of learning in one situation on learning in a second situation and,through that,on achievement in a transfer situation.Transfer has also previously been studied in terms of the effects of learning in one situation on learning in another situation,but the preparation for future learning studies have unique features that are essential to the point I want to make.

The point of departure in these studies is the standard transfer paradigm aimed at comparing the transfer effect of two different conditions for learning.In a traditional transfer approach,two comparable groups of students try to learn what is nominally the same thing under two different conditions,and they are tested for what they have learned under the same condition novel to both.To the extent that there are differences,they are in transfer from the two different conditions for learning to the same condition for appraisal.

Now,it might be the case that no differences are found between the two conditions.One of the conditions might,however,be better at preparing the learners for future learning,even if it does not yield superior achievement directly in a standard transfer design study.In order to investigate this,Bransford and Schwartz (1999) suggested the use of what Schwartz and Marting (2004) called “the double transfer” design.Students were assigned one of two instructional treatments (Situations A1 and A2).Half of the students from both treatments were given access to a common learning resource (Situation B),such as a lecture or a sample worked-out problem,followed by a request to solve a transfer problem (Situation C).The other half from both treatments were asked to solve the transfer problem (Situation C) directly without access to the learning resource.The researchers called this a double transfer paradigm because students needed to “transfer in” what they had learned from the instructional treatment to learn from the resource,and they needed to “transfer out” what they had learned form the resource to solve the target transfer problem.To the extent that there were differences,they were between the two conditions (Situations A1 and A2) as far as their direct effect on the transfer task (Situation C) and their effect on learning in Situation B are concerned.The differences might thus have originated from two kinds of transfer effects.What follows is an example of how such a design has been used in a specific study.

Schwartz and Martin (2004)

In this study,the object of learning (what was to be learned) was comparing (high) scores on different scales from different distributions,operationalized in terms of the target task given after instructions.The following task was utilized:“which of two students,who were in different biology classes and took different tests,did better on their respective test?” Such problems are commonly solved in statistics by using standardized scores (i.e.,by dividing the difference between the actual score and the mean for the distribution that the score belongs to by the standard deviation for the same distribution).In order to grasp the rationale underlying this transformation,the ninth grade students participating in the study had to notice two critical respects in which distributions might differ from each other,namely central tendency and variability.[There are similarities with the children in Judd's (1908) experiment,who had to notice the angle of refraction and the depth of water as two critical differences between the two tasks they were dealing with.]

After having dealt with statistics for 2 weeks (all in the same way),the students were presented scenarios,such as the following:“who broke the world record by the most impressive amount—John in high jump or Mike in javelin throw?” In one of the instructional treatments (Situation A1),the students were asked to invent their own way of solving the problem given raw data for the distributions of the best results during the year.In the other instructional treatment (Situation A2),the students were shown a graphical method for standardizing the scores by using histograms and a method for comparing the standardized scores subsequently.They kept practising this method under the supervision of the teacher.Half of both treatment groups also received a learning resource-two worked-out examples (Situation B).The learning resource was intended to provide the students with a tool for dealing with the target transfer problem.The students were instructed how to compute and compare standardized scores (e.g.,“Is Betty better at assists or steals?”).Finally,all of the students received the target transfer problem,namely to compare scores on two different biology tests (Situation C).In this case not all the raw data were provided,but only the means and standard deviations for the two tests along with raw scores for the two students to compare.

There was no significant difference between the two instructional treatments for the students who did not receive the learning resource.However,the additional component of the worked-out examples in Situation B yielded a striking difference,namely a strong effect for the “invent a solution strategy” group (Situation A1) and little or no effect on learning for the group that was simply shown a graphical solution (Situation A2).

The experiment shows that although the students' attempt to invent a solution for the first problem (Situation A1) did not have any advantage for solving the target problem (Situation C) as compared to telling the students a way of solving it (Situation A2),the attempt had a definite advantage when it came to learning to solve the target problem (Situation B).So the differential effect on the transfer problem was not due to the first part of the experiment (the instructional treatment) nor to the second part (exposure to a learning resource).The effect was due to the relationship between the first and second parts.The effect of the first part was thus contingent on the second part.If we denote the three parts A (by collapsing Situations A1 and A2 into each other),B,and C,we can conclude that no differential effect of A on C and no differential effect of B on C can be identified.The effect is contingent on what other tasks the learners may encounter.So the effect of a task on another task can only be determined specifically in the context of a given set of tasks.In accordance with Lave's (1988) argument for considering sets of situations,instead of only two situations at a time,changing the set may imply that the effect of a given task on another task will change as well.

But what is the nature of the relationship between Situations A and B? None of the students in the invention group came up with the canonical solution of the problem (standardized scores),whereas students in the tell-and-practice condition group were shown something close to that.Yet,there was a greater positive effect of A1 and B on C as compared to the effect of A2 and B on C.An important element in the invention approach was the use of contrasting cases.As mentioned above,the students were supposed to compare raw scores and distributions to which those scores belonged.By doing so the students tried to determine how the distributions differed from each other,and they then noticed critical differences between them,even if they could not capture those features and their relations to each other ex-plicitly in technical language.If the differences between cases provided the students with one kind of contrast,juxtaposing their own vague ideas about central tendency and variability with the standard statistical versions of the same phenomena provided them with another contrast.

As has been the case with many studies in the preparation for future learning paradigm,Schwartz and Martin (2004) made use of differences (contrasts) on two levels:between the conditions inventing (Situation A1) and being told (Situation B),on the one hand,and within condition A1 (using contrasting cases in inventing),on the other.By encountering differences between cases,the learners noticed differences that were critical for distinguishing between them.

This line of reasoning leads to the conclusion that what was learned in this case was the increased sensitivity to perceive certain situations in certain ways.That is,students developed a capability to discern aspects of the situation as critical or relevant and take them into consideration at the same time.Additionally,they began to recognize relationships between those aspects as relevant.This implies that the acts of discernment (that which is learned) and what is discerned in the situation (the features of the situation toward which the acts are directed) are not separate.

One cannot discern without something being discernable,nor can anything be discerned without an act of discernment.Accordingly,learning about central tendency or variability,for instance,amounts to learning to discern central tendency and variability in the distributions one encounters in the future.

Why Understanding of One Theory is Necessary for the Understanding of Another Theory

I can go further and argue that it is actually impossible to grasp anything without having experienced an alternative option.It is,for instance,difficult,if not impossible,to understand a theory without having come across an alternative theory of the same phenomenon.How could one otherwise distinguish between the theory and the phenomenon? But this means at the same time that it is impossible to understand the first theory of a phenomenon that we encounter.Arguments for having history of science as part of the science curriculum are in line with this claim:thus,by contrast (and paradoxically for many),learning about the Aristotelian paradigm of force-motion relationship fortifies understanding of its Newtonian counterpart,the Cartesian interpretation of weight helps students to understand Newtonian gravitation,the idea of absolute space-time reveals the meaning of the relativistic conception,the geocentric world system facilitates understanding of the heliocentric model...(Tseitlin & Galili,2004)

Aristotle and Newton

Let us examine briefly the first of these examples.According to Newton's first law of motion,a body remains at rest or continues to move with the same velocity unless a force acts upon it.This can be illustrated by a spaceship moving from the earth to Venus with all of its engines switched off.If this were all we focused on,Newton's law would not even seem a law,but rather a straightforward empirical generalization,much like the observation that things fall when we drop them.In order to perceive the significance of Newton's formulation,we have to step outside it.In fact,looking at what is happening on the earth instead of looking at what happens in space would do.If one tries to switch off the car engine while traveling at 50 km/hr and then declutches,she will notice that the car will not continue to travel at 50 km/hr.It slows down and soon comes to a halt.As no force appears to act on the car,we may conclude that Newton must be wrong.But then someone might remind us that air resistance is one force that acts on the car running forward with the engine switched off,and another is friction in the axle and on the road.It is these forces that make the car stop.If there were no air resistance and no friction,the car would continue to move forward at 50 km/hr.There is always air resistance,and there is always friction in the world in which we live.By juxtaposing Newton's first law with our everyday experience,we can conclude that far from making a rather trivial observation,Newton saw and formulated a principle that cannot be seen directly at all on earth.Our everyday experience is more in line with the Aristotelian idea that in order to make something move we need a force.Now,this principle probably seems straightforward to most of us,yet it contradicts the Newtonian formulation.How is this? What Aristotle says is that if a body is at rest,it takes a force to make it move.This is true.But the principle is overgeneralized if we believe that if something is moving there must be a force that keeps it moving.

The difference between Aristotle and Newton is that the former tries to explain the difference between rest and movement,whereas the latter tries to explain differences in velocity,where rest can be seen as a special case of zero velocity.Change in velocity is called acceleration (or deceleration).Change in velocity,for example between rest and movement,requires a force.Movement with constant velocity (including rest) does not need any explanation according to Newton.We can see the dif-ference between Aristotle and Newton in terms of two different ways of making distinctions.The former makes a distinction between rest and movement,whereas the latter makes a distinction between constant velocity and changing velocity.I can draw two conclusions (at least) from this.First,the Newtonian way of making distinctions between bodies in movement is more powerful than the Aristotelian way (according to the study of physics).Second,the Newtonian way of making the distinction becomes visible when it is juxtaposed with the Aristotelian.

This is the tentative answer to why understanding one theory is necessary for the understanding of another theory.But does it really work like this in practice? Galili and Hazan (2000) provided empirical evidence that it does indeed.They compared the understanding of optics by students engaged in an experimental year-long historically oriented physics course (the target group) with that by students who participated in a comparable conventional course in physics(the comparison group).Many scientific ideas were appropriated by virtually all of the students in the target group.By way of contrast,the students in the comparison group showed very little understanding.The main mechanism behind this difference in achievement was the frequent use of contrasts between different ideas of the same thing in the experimental treatment.For example,in the comparison group,the view of light rays was reified as “what light comprises”. In contrast,in the target group,light rays were conceived as an auxiliary tool in modern science,as a result of exploring contrasting ontological claims of historical models.

My interpretation is thus that the important function of the introduction of alternative conceptions;of historical origin;or of the students' own naive views,judged to be wrong by current science;is that those alternative conceptions make the conceptions judged right by current science visible.Without a contrast,all students can do is learn the “whole”, often by rote,and this does not prepare them for handling novel problems in powerful ways in the future.For example,the notion of an auxiliary tool does not have meaning without access to something that is not an auxiliary tool,such as the ontologically much “heavier” reified view of “light ray”.

Learning about the antique view makes it possible for the learner to understand the modern view,not because the two are similar,but because they are different.Although the two views are views of the same thing,the positive (transfer) effect does not derive from perceiving what is same,but from discerning what is different.My suggestion is that learning about the antique view and then learning about the modern view is more conducive to learning then learning about the modern view twice.This should be true,at least if learning is measured in terms of the learner's discernment of critical differences between different instances of the phenomenon in question (e.g.,force and motion or light rays).

Perceptual Learning

Perceptual learning as an academic specialization was established by Eleanor J.Gibson.In 1955 she published a highly influential article in Psychological Review,“Perceptual Learning:Differentiation or Enrichment?” ,coauthored by her husband,James J.Gibson.The authors argued that there are two schools regarding perceptual learning.According to the enrichment school,we receive scarce,impoverished information from the environment,which must be added to and enriched.According to the differentiation school,we receive so much information from the world that we have to differentiate or select.Consequently,learning to perceive amounts to learning to find the differences that are most critical in relation to our goals.J.J.Gibson and Gibson (1955) demonstrated that learning to know something (in the sense of being able to recognize it) is a matter of learning how it differs from other things (i.e.,becoming able to discern the respects in which it differs from other things).

The ecological approach to perceptual learning championed by Eleanor Gibson belongs clearly to the differentiation school.In her book with Anne Dick,she included the Gestaltists in the same camp,whereas the behaviorists and cognitivists were listed under enrichment theories (E.J.Gibson & Dick,2000).

Learning to differentiate means making finer and finer discriminations.This differentiation amounts to becoming attuned to distinguishing features—or critical differences—that can be used for making distinctions.These distinctive features are simply dimensions in which things vary,and they can be used for telling apart instances from noninstances.Thus,perceptual learning amounts to discerning dis-tinctive features or critical dimensions of variation.If I now apply the terminology of transfer to perceptual learning,it means that in Situation A the learner learns to differentiate between instances or noninstances,whereas in Situation B she encounters something that is either an instance or a noninstance.To the extent that the learner is better at correctly differentiating between instances and noninstances in Situation B as a result of having participated in Situation A,then there will be a positive impact from Situation A on Situation B.If we agree with the Gibsonian view of perceptual learning,then performance in Situation B is a function of the differences within Situation A,as well as the differences between Situations A and B.Eleanor Gibson (1969) cited a transfer study carried out by Vurpillot et al.(1966) in which it was demonstrated that perceptual learning involves learning and transferring distinctive features that separate instances from noninstances (as opposed to learning and transferring features that the instances have in common).

This is true,of course,only if the learners are given opportunities for making such discriminations.An observation from Pavlov is relevant here.He found that reflexes that were conditioned to a certain stimulus were generalized to other similar stimuli (due to sameness).

The next question was,then,how can differentiation be learned? Pavlov believed that there were two options.One could either repeatedly reinforce the conditioned stimulus,or one could introduce an unreinforced contrast along with the reinforced conditioned stimulus.He found the first method did not work.There was no differentiation even though the stimulus was repeated with reinforcement more than 1000 times.On the other hand,even a single appli-cation of unreinforced contrast could lead to rapid differentiation (Pavlov,1927,as cited by E.J.Gibson,1969).

The chapter in E.J.Gibson's (1969) book on differentiation theory begins with a quote from the 19th-century British philosopher and theologian,James Martineau,that explains well the differentiating force of contrasts.

When a red ivory ball,seen for the first time,has been withdrawn,it will leave a mental representation of itself,in which all that simultaneously gave us will indistinguishably coexist.Let a white ball succeed to it;now,and not before,will an attribute detach itself,and the color,by force of contrast,be shaken out into the foreground.Let the white ball be replaced by an egg,and this new difference will bring the form into notice from its slumber,and thus that which began by being simply an object cut out from the surrounding scene becomes for us first a red object,than a red round object,and so on(Martineau,Essays Philosophical and Theological,as quoted by E.J.Gibson,1969).

This is an interesting quote in more than one respect.First,it describes what we might call retrospective transfer:how the image of an object is affected by experiences following the birth of the image.Second,we can look at the effects in the reverse direction,namely how the perception of objects is affected by previous experiences.The observer may not have separated out and noticed the form of the white egg without having seen the white ball just before,nor may she have separated out and noticed the color of the white ball without having seen the red ball just before.

Motor Learning

In 1975,Schmidt published his article “A Schema Theory of Discrete Motor Skill Learning”, which has become highly influential in the field of motor learning.One of the best known implications of the theory is the variability of practice hypothesis,which states that varied practice (“Situation A” in transfer terminology) is likely to enhance performance in a new situation (Situation B).What is interesting is that the relationship is not considered as a function of sameness between two situations,but as a function of differences across situations.The hypothesis received a great deal of support from empirical tests during the years following its publication (Shapiro & Schmidt,1982;Shea & Wulf,2005;Sherwood & Lee,2003).

Let us look at a typical study carried out by Moxley (1979).Eighty children,aged 6 to 8 years old,participated in the experiment and were distributed across two conditions.The task was to try to hit a target (a carpet) with a shuttlecock.In one of the conditions the children threw 20 times from five different angles,each from the same distance,sitting on the floor with their feet pointing toward the opposite wall.In the other condition the children had to throw shuttlecocks 100 times from the same position.Afterwards both groups tried to hit the target with shuttlecocks from a new position.The group subjected to the varied condition succeeded best with the criterion task.

Kerr and Booth (1978) carried out a similar study,in which the task was again to hit an object,but the critical difference between attempts was the distance to the target.The task was to hit a target with miniature beanbags.Thirty-six children with an average age of 8.3 years and 28 children with an average age of 12.5 years participated in the experiment.Both age groups were randomly assigned to specificity and schema groups.All children were tested in the beginning and at the end of the experiment,the younger group throwing from 3 ft and the older from 4 ft.The deviation from the target was measured after each throw and the difference between the average error at the beginning and at the end of the experiment was the measure of the effect of the practice between the two tests.

The practice for the specificity group consisted of a great number of throws,all from the criterion distance of 3 ft for the younger children and 4 ft for the older children.Children in the schema group practiced at a variety of distances,none of which was the criterion distance.It is interesting to note that the children in the schema group outperformed the children in the specificity group.Therefore,practicing something other than what was tested was more effective than practicing exactly what was tested.

Although the variability of practice hypothesis received much empirical support,Schmidt's (1975) theory has been seriously questioned (see Shea & Wulf,2005;Sherwood & Lee,2003).I would also like to interpret the variability of practice hypothesis in a way that differs from Schmidt's original formulation.He believed that varied practice enhances future performance with novel tasks thanks to the contribution of varied practice to a more robust schema governing behavior.My interpretation is that variation is necessary for discernment,and that the learners must discern critical differences between situations in order to be able to adjust to new situations.

Making Systematic Use of Variation to Enhance Learning

Understanding the relationship between what students learn in the classroom and what is happening in the classroom remains a central question in educational research.This question has typically been addressed by relating learning outcomes to the ways in which learning is organized (individualized instructions,whole-class teaching,project learning,peer learning,reciprocal teaching,and so on).Describing differences in how learning is organized does not necessarily say anything about differences in the content of learning or about differences in what learners attend to.In order to identify the latter,we have to compare different classrooms in which the same content is addressed.When we do so,we might notice that when specific content is dealt with,some of its aspects are varied,whereas others remain invariant.When looking at the graphical representations of linear equations of the form y= ax+b,for instance,the teacher might keep b invariant (e.g.,b = 0) and substitute different values for a (e.g.,a= 1,1/2,2,3, …),thus focusing attention on the slope of the line.In a similar manner,the teacher might keep the slope invariant and vary the b value,thus focusing attention on parallel lines and varying intercepts.If only the slope is varied,the students might learn one thing.If only the y-intercept is varied,the students might learn another thing.If both parameters are varied,the students might learn a third thing.If both sources of variation are mixed in a random fashion,students might learn differently again.If there is one line in a coordinate system on the screen that the students can change by plugging different values into the equation y=ax+b,the outcome may again be different.

Differences in what varies and what is invariant make different things possible to learn.Consequently,the pattern of variation and invariance when dealing with some particular content is highly correlated with what the students learn about that content.For example,Marton and Pang(2006)found that in five comparable high-achieving secondary school classes,taught by five highly experienced teachers,the proportion of students who mastered a difficult economic concept varied between 6% and 97% depending on the different patterns of variation and invariance and on how systematically those patterns were used in the classrooms.In the next two sections I explore the effect of the systematic use of variation in classrooms by considering several examples.

Learning Cantonese words.

In two comparable second grade classes in Hong Kong,the students were supposed to learn seven Chinese words in the context of a short story (Chik & Lo,2004).Their success was measured by means of a “cloze-test”, a short novel text with a number of words deleted,leaving gaps between the words.The students were asked to fill in those gaps with the appropriate words.It was possible to fill the gaps by using the seven words that had been studied.In Class A,all 30 students completed the task correctly,whereas in Class B only 9 of 31 students did so.When comparing how the same content was handled in the two classes,it was found that in Class A different aspects of each word (form,meaning,pronunciation) were dealt with in conjunction with one other (i.e.,these aspects were kept together for each word).By way of contrast,in Class B each aspect was dealt with separately (i.e.,words were grouped according to each aspect).This meant that in Class A the integrity of the Chinese words was maintained (their related aspects were kept together),whereas in Class B the aspects were kept separate and the words appeared once for each aspect.

The same content was thus dealt with in both classes,but differently.In Class A,for each word the aspects varied.In contrast,in Class B,for each aspect the words varied.Thus,the differences had to do with how the content was structured,what varied primarily and what varied secondarily,and hence what learner attention was attracted to.

In many of the previous examples presented in this article,I have argued for the importance of contrast and of variation in critical aspects.In the study of Cantonese words,however,it was important that a number of critical aspects of words (form,meaning,and pronunciation) varied together (from word to word).As the words were basically known as spoken words to the students already,in order to link the spoken words to the written forms (characters) and to the elaborated meanings,it was advantageous for the different aspects to vary together.This shows that there are no patterns of variation or invariance that are superior for learning or for transfer in general.The point I want to make is that patterns of variation and invariance are critically important for learning and transfer,but the specific pattern depends on the particular object of learning and on the group of learners participating in the learning experience.

Learning Cantonese tones.

I now extend the discussion to consider an instructional approach in which the pattern of variation and invariance is theoretically based.[3] Ki and Marton (2003) investigated the difficult process of learning Cantonese tones by adult foreigners.Cantonese is a complex dialect of Chinese,and spoken Chinese words are distinguished by both sound and tone.This means that two or more words pronounced with the same sound can have (and usually do have) two or more different meanings based on distinctions in tone or pitch.There are six different tones in Cantonese:high level,high rising,mid level,low level,low falling,and low rising.One might believe that Cantonese speakers are more sensitive to tonal differences,but this is not the case.Stagray and Downs (1993) compared native speakers of English and Cantonese and concluded that the former actually made finer discriminations in pair—wise comparisons of singular tones.Thus there must be another explanation for foreigners' difficulties with Cantonese.

Differences in pitch (of which tones make up a special case) are also important in nontonal languages,but in these they are used to make distinctions at the sentence level (e.g.,between interrogative and imperative modes),not to indicate differences in meaning at the word level.Consequently,when a speaker of an atonal language hears a tonal language in which every word has both a sound and a tone component,she automatically tries to distinguish between meanings in terms of differences in sound alone.By doing so she remains incapable of making essential distinctions in perception or in production between words that have the same sound but different tone components.When variation in sound is suppressed and variation in tone and meaning is afforded,the learner must,however,use tonal variation for making distinctions between meanings.

Ki and Marton (2003) found that by inviting nonnative learners to learn sets of Cantonese words that have the same sound but different tone components,they learned to attend to tonal differences in order to make distinctions between meanings on the word level.This represented a fundamental switch in learners' attentional field.Not only did the learners get better at distinguishing between meanings by means of tones as regards the words in the exercise,but they also got better at learning to distinguish between meanings by means of tones when variation between sounds was not suppressed.In a follow-up study,Ki and Marton (2005) demonstrated that by first separating variation in tone and meaning and keeping sound invariant,then separating variation in sound and meaning and keeping tone invariant,and finally letting all three dimensions vary simultaneously,learners can make great progress toward handling the complexities of everyday use of Cantonese.

Learning to spell the tj (C) sound in Swedish.

Another study showed that what distinctions the learners can make—and have to make—is highly important for learning.A group of Swedish teachers,working with a researcher,designed a lesson that they thought would contribute to improving students' spelling of words with a tj sound (Holmqvist,Gustavsson & Wernberg,2005;in press).There are different ways to spell this sound that conform to a rules—governed system based on consonant-vowel combinations with certain exceptions.One of the teachers carried out the lesson designed by the group.Afterwards the group redesigned the lesson,and it was carried out by a second teacher with another group of students.The group revised the lesson again,and it was carried out by a third teacher with a third group of students.The students participating in these three lessons came from different classes,and they were kept together for one lesson only,after which they returned to their own classes.All three groups took the same spelling test immediately after the lesson,again after 4 weeks,and once again after another 8 weeks.One of the groups lagged behind the other two groups initially but surpassed both groups some time after the experiment.This group improved on each test,whereas the other two groups remained at the same level or performed slightly weaker.The researchers investigated how the instructional treatments contributed to the differences across groups.

To explain why one group improved more than the two other groups after the learning occasion,the researchers conjectured that the students in the more successful group learned to pay attention to the differences between the tj sound and the sj sound (and other sounds),as well as to different spellings of the tj sound as compared to different spellings of the sj sound.The students became sensitized to a wider range of differences than the students in the other groups.Hence they had more opportunities for practicing what they had learned whenever they read or wrote something in Swedish subsequent to their participation in the experiment.This condition was simply a better preparation for future learning,in the sense used by Bransford and Schwartz (1999).

Providing for individual differences.

The last study described in the previous section was a blend of Japanese “lesson study” and design-based research (Holmqvist et al.,in press).In such a blended approach,called learning study,an object of learning is chosen by a group of teachers assisted by a researcher.The students' prior understanding of the object of learning is revealed by means of a pretest.One or more lessons are designed to help students develop an understanding of the chosen object of learning by building upon students' prior understandings.The lesson (or sequence of lessons) is carried out by one of the teachers.Students are assessed after the lesson,and the results are discussed in relation to what has taken place in the classroom.As a result,the lesson is revised and taught by another teacher.One study might comprise one to four such iterative cycles.

Twenty-seven classes participated in a research and development project aimed at providing for individual differences through learning studies in Hong Kong.Special attention was paid to low-achieving students. Systematic use of variation was adopted as the main approach to the enhancement of learning in the project (Lo,Pong & Chik,2005).Seventeen lessons were designed,with a number being revised once or twice,for a total of 27 lessons.In all of the lessons,variation was used in systematic ways.In 25 of the 27 cases,students performed significantly better after the lesson than they had before.Furthermore,this improvement was significantly higher among initially low-achieving students than among high-achieving students,even when correction was made for ceiling effects.Transfer to the annual attainment test was found,especially for the low-achieving students. Finally,researchers observed that high-achieving students developed insights that went well beyond the object of learning and hence were not measured by the assessment tools.