Using “Think Alouds” to Evaluate Deep Understanding
Lendol Calder and Sarah-Eva Carlson
September 25, 2002

Lendol Calder is indeed a teacher-scholar.  An associate professor and chair of the history department at Augustana College, he was a 1999 Fellow at the Carnegie Academy for the Scholarship of Teaching and Learning (CASTL) in the prestigious Pew Scholars Program.  He regularly speaks to audiences about the problems of introductory and survey-level teaching and shares his research on how to replace "coverage" with "uncoverage" in history survey courses.  Calder holds a PhD from the University of Chicago, and is the author of Financing the American Dream: A Cultural History of Consumer Credit (Princeton, 1999).

Sarah-Eva Carlson is a history major and senior at Augustana College.  After graduation she hopes to study International Relations at Cambridge University.  As Dr. Calder's senior research assistant for the "Beyond Coverage" project, Carlson helped to conduct, code, and evaluate think alouds.

“Deep understanding” is what teachers want for students.  But how do we know when it has been achieved?  And are certain assessments better than others at shedding light on what students really know and understand?  Few would defend the method used by a deaf English public schools inspector who, after listening to student recitations, would rise and declare, “I have not been able to hear anything you have said, but I perceive by the intelligent look on your faces that you have fully mastered the text.”  This essay describes the authors’ experience with ineffective student learning assessments and their subsequent employment of an effective technique called the think aloud method.

Think alouds are a research tool originally developed by cognitive psychologists for the purpose of studying how people solve problems.  The basic idea behind a think aloud is that if a subject can be trained to think out loud while completing a defined task, then the introspections can be recorded and analyzed by researchers to determine what cognitive processes were employed to deal with the problem.  In fields such as reading comprehension, mathematics, chemistry, and history, think alouds have been used to identify what constitutes “expert knowledge” as compared to the thinking processes of nonexperts.  For first year assessors, think alouds offer a promising method to uncover what conventional assessment methods often miss: hidden levels of student insight and/or misunderstanding.

Experienced teachers know that popular assessment methods conceal as much as they reveal.  Papers and exams, for example, offer little help for figuring out why a student has recorded a wrong answer or struggled unsuccessfully with an assignment.   Conventional assessments also run into problems of validity.  Because they rely on students’ ability to articulate them selves in formal language, papers and exams tend to conflate understanding with fluency.  But sometimes, especially with first-year students, the tongue-tied harbor deep understandings even though they perform poorly.  The reverse is true, as well; sometimes, articulate students are able to say more than they really understand.  “The thorniest problem” of assessment, according to Grant Wiggins and Jay McTighe (1998), calls for differentiating between the quality of an insight and the quality of how the insight is expressed.

We first utilized think alouds when assessing a new design for a first-year history course.  The new design shifted emphasis away from tidy summaries of historical facts and knowledge toward the central questions, methods, assumptions, skills, and attitudes that characterize history as a discipline.  Students completed eight identical assignments in the course, and student learning was measured by comparing the students’ first and last papers.  The results were disheartening.  It was the rare student who showed steady progress from week to week, and few of the final papers were superior to the first ones.  On the basis of this evidence, it seemed the new course was a failure.

But different evidence suggested otherwise.  In course evaluations and self-reports, students insisted they had learned a great deal, a claim that certainly squared nicely with the intelligent looks on their faces at the end of the term.  Puzzled by the conflicting evidence, we turned to think alouds for help.

Our procedure was as follows.  From sixty students in the course, twelve were selected to participate in a think aloud study, representing a cross-section of students in terms of gender, grade point average, and major/nonmajors.  For their participation, subjects were paid ten dollars an hour.  In week one of the course, we sat down with each student in a room equipped with a tape recorder.  After training subjects how to think out loud, we presented them with documents concerning the Battle of the Little Bighorn, a subject most knew little about.  Then we asked our subjects to think out loud while “making sense” of the documents.  This was essentially the same task they would perform eight times over the length of the course, though in this case their thoughts would not be filtered by the task of composing an essay.  With the tape recorder running, subjects read through the documents aloud, verbalizing any and all thoughts that occurred to them.  When subjects fell silent, we would prompt them to think out loud or to elaborate on their thoughts as they attempted to make sense of the historical evidence.

Our think aloud sessions lasted anywhere from forty to ninety minutes.  After all twelve sessions were completed the tape recordings were transcribed for analysis.  Analysis took the form of coding each discrete verbalization in the transcript according to the type of thinking it exemplified.  We were able to identify fifteen different types of thinking processes displayed in the think alouds, from the uncategorizable (“it sure is hot in here”) to comprehension monitoring (“I don’t understand that part”) to the six types of historical thinking we were particularly looking for, such as sourcing a document (“I can’t trust Whittaker; he wasn’t there”), asking a historical question (“I wonder what caused this battle?”), or recognizing limits to knowledge (“I need to see more evidence than this”).  After coding each think aloud independently, we used a common rubric to rate each subject’s proficiency on the six thinking skills taught in the course.  For this, we used a 5-point Likert scale where “1” indicated the undeveloped ability of an average high school senior and “5” indicated a sophistication comparable to that of a professional historian.  We then compared our coded transcripts until reaching consensus on how to rate the students’ abilities in the six key areas.  To prevent our bias as course designers from influencing the results, we contracted with an outside analyst to help us code the transcripts and rate students’ abilities.

At the end of the term the twelve subjects completed a second think aloud.  When these sessions had been transcribed and coded and the subjects’ abilities rated, we compared the first and second think alouds to determine whether students had made gains in their understanding of what it means to “think historically.”

The think alouds opened a fascinating window into the thinking patterns of students before and after the course.  Overall, the think alouds revealed cognitive enhancements that were not as dramatic as claimed in student self-reports, but much greater than indicated by using comparisons of early and late papers.

Other surprises were equally interesting.  Under-performing students struggled less with historical thinking than with reading itself.  Moreover, in the second set of think alouds, we noted that some of the best insights and meaning making came from students who, in the gradebook, were steady “B” and “C” performers.  For them, deep understandings seemed to evaporate when they tried to wrestle their thoughts to paper.  This told us that we had work to do if we wanted to distinguish between assessing understanding and assessing students’ ability tocommunicate their understanding.  The real roadblocks to learning historical thinking, we discovered, are poor reading comprehension and prose writing.

On our campus, the potential of think aloud protocols has not been lost on other faculty.  For example, library staff are using think alouds to assess how first-year students find information when they go to the library.  Information gained from the study will be used to help library staff identify misconceptions and teach against common missteps students make when doing research.

Think alouds are not perfect assessment instruments.  The advantage of think alouds is that they give us insight into our students’ struggle to formulate problem-solving strategies, employ skills, and develop insights.  Papers, exams, and ex post facto commentary by students are helpful in their own way.  But they make the process of understanding seem more orderly than it is, covering up the confusion, the disorientation, the mimicry of correct responses, and the lucky guesses — all of which are good to know about when assessing teaching and learning.

As the emphasis in first-year pedagogy switches from teaching to learning, from “what am I going to do today” to “what are they going to do today,” the days using simply papers and exams to assess student learning are long gone.  Teachers need more procedures capable of opening up the hidden world of learning.  Think alouds can be helpful this way, especially in courses emphasizing the development of cognitive skills. 


On course design and assessment: 

Wiggins, G. & McTighe, J.  (1998).  Understanding by Design.  Association forSupervision and Curriculum Development,

On think alouds—their history, effectiveness, and procedures:

van Someren, M. W., Barnard, Y. F. & Sandberg, J. A. C.  (1994).  The Think Aloud Method: A Practical Guide to Modelling Cognitive Processes.  Academic Press.

Ericsson, K. A. & Simon, H. A.  (1993).  Protocol Analysis: Verbal Reports as Data. MIT Press.

Meyer, D.K.  (1993, Summer).  Recognizing and Changing Students’ Misconceptions: An Instructional Perspective.  College Teaching 41, 104-108.

Disciplinary Uses of Think Alouds: 


Crain-Thoreson, C., Lippman, M. Z., & McClendon-Magnuson, D.  (1997, December) Windows on Comprehension: Reading Comprehension Processes as Revealed by Two Think-Aloud Procedures.  Journal of Educational Psychology 89, 579-591.

Kucan, L. & Beck, I. L.  1997, Fall.  Thinking Aloud and Reading Comprehension Research: Inquiry, Instruction, and Social Interaction.  Review of Educational Research67, 271-299.

Swearingen, R. & Allen, D.  (1997).  Classroom Assessment of Reading Processes. Houghton Mifflin.

Wade, S. E.  (1990, March).  Using Think Alouds to Assess Comprehension.  Reading Teacher43, 442-451.


Bowen, C. W.  Think-Aloud Methods in Chemistry Education:  Understanding Student Thinking.  Journal of Chemical Education 71, 184-190.


Schoenfeld, A. H.  (1987).  Cognitive science and mathematics education.  Lawrence Erlbaum Associates.


Calder, L.  (2002, March).  Looking for Learning in the History Survey. American Historical Association Perspectives40, 43-45.

Wineburg, S. S.  (1992, March).  Probing the Depths of Students’ Historical Knowledge. American Historical Association Perspectives 30, 1-24.

Wineburg, S. S.   (1991, March).  Historical Problem Solving: A Study of the Cognitive Processes Used in the Evaluation of Documentary and Pictorial Evidence.  Journal of Educational Psychology 83, 73-87.

These comments were posted to the First-Year Assessment Listserv (FYA-List) on September 25, 2002. Recipients are free to forward this message to other interested persons.