Evaluating New-Student Seminars and Other First-Year Courses via Course-Evaluation Surveys: Research-Based Recommendations Regarding Instrument Construction, Administration, Data Analysis, Data Summary, and Reporting Results
By: Joe Cuseo
The recommendations below have been adapted from an essay written by Joe Cuseo in 2000 for the FYA-Listserv. These suggestions will help you build an effective and valid course evaluation instrument for your first-year (and other) courses.
Click here to view Joe Cuseo's bio, or click here to view other Cuseo essays from the FYA-Listserv. Click here to join the FYA-List.
Click on any recommendation below to view additional comments in support of the suggestion. To view the original essay as a PDF, click here. A list of references is available at the bottom of this page.
Recommendations Regarding the Construction of the Course-Evaluation Instrument
- Cluster individual items into logical categories that represent important course objectives or instructional components.
- Provide a rating scale that allows five-to-seven choice points or response options.
- If possible, do not include the neutral "don't know" or "not sure" as a response option.
- Include items that ask students to report their behavior.
- Ask and leave space for comments after each question.
- Include at least two global items on the evaluation instrument pertaining to overall course effectiveness or course impact.
- Include an open-ended question asking for written comments about the course's strengths and weaknesses, and how the latter may be improved or rectified.
- Provide some space at the end of the evaluation form so that individual instructors can add their own questions (Seldin, 1993)
- Give students the opportunity to suggest questions that they think should be included on the evaluation form.
Recommendations Regarding the Wording/Phrasing of Individual Items
- Avoid generic expressions that require a high level of inference or interpretation, which may have different meaning for different students (for example, "Pacing of class presentations was 'appropriate'.")
- When soliciting information on the incidence or frequency of an experienced event (e.g., "How often have you seen your advisor this semester?"), avoid response options that require high levels of inference on the part of the reader (e.g., "rarely"-"occasionally"-frequently").
- When asking students to rate their degree of involvement or satisfaction with a campus support service or student activity, be sure to include a zero or "not used" option.
- Use the singular, first-person pronoun ("I" or "me") rather than the third-person plural ("students").
- Avoid compound sentences that ask the student to rate two different aspects of the course simultaneously (e.g., "Assignments were challenging and fair.")
- Include one or two negatively worded items that require students to reverse the rating scale (e.g., "I did not receive effective feedback on how I could improve my performance.").
Recommendations for Administration of Course Evaluations
- Across all sections of a given course, standardize the amount of time allotted for students to complete their evaluations.
- Take into consideration the academic term or semester when course evaluations are to be administered.
- Be aware of the “burn-out,” or fatigue factor.
- Across all sections of a given course, standardize the instructions read to students immediately before distribution of the evaluation forms.
- Use instructions read prior to distribution of evaluation forms to prepare or prime students for the important role they play in evaluating college courses and to provide students with a positive "mental set".
- If possible, standardize the behavior of instructors during the time when students complete their evaluations.
Recommendations for Analyzing, Summarizing, and Reporting the Results of Course Evaluations
- Report both the central tendency and variability of students' course ratings.
- The identification and sharing of strategies for instructional improvement should be an essential component of course assessment, and it is a form of feedback that has been commonly ignored or overlooked when student ratings are used to evaluate college courses (Stevens, 1987; Cohen, 1990).
- To maximize the opportunity for instructors to make instructional improvements while the course is still in progress, it is recommended that course evaluations be administered at midterm to obtain early feedback.
- Compare evaluations to those provided for other first-year courses to gain a reference point for interpreting student perceptions of the course.
- Surveys or questionnaires could also be used to obtain a different type of comparative perspective on the course-the retrospective perceptions of course alumni.
- Administer the course-evaluation instrument to students before the class begins and re-administer it at course completion to gain perspective on student changes in attitude or behavior between the start and finish of the course.
Home Grown vs. Commercially Available Course Evaluation Instruments
There are also a number of commercially developed instruments available to assess students' reported behaviors and their degree of involvement with campus services and student activities, such as
- "The College Life Task Assessment Instrument" (Brower, 1990, 1994),
- the "College Student Experiences Questionnaire" (CSEQ)(Pace, 1990; Pace & Kuh, 1998), and
- the "Critical Incident Techniques & Behavioral Events Analysis" (Knapp & Sharon, 1975).
The availability of such instruments raises the larger issue of whether the college should rely exclusively on locally developed ("home grown") assessment instruments for evaluating first-year courses, or whether they should purchase externally constructed ("store bought") instruments from testing services or centers. There are some advantages associated with external, commercially developed instruments, namely:
- Their reliability and validity are usually well established.
- They are efficient, saving time that the institutional members would have to devote to instrument construction and scoring.
- Norms are typically provided by the testing service that allow the institution to gain a comparative perspective by assessing the responses of its own students' relative to national averages.
However, a major disadvantage of externally developed instruments is that the questions asked and results generated may be less relevant to the institution's unique, campus-specific goals and objectives than those provided by internally constructed instruments. Some testing services attempt to minimize this disadvantage by allowing the college the option of adding a certain number of their own questions to the standardized instrument. The availability of this option may be one major factor to consider when reaching decisions about whether or not to purchase an external instrument.
Another important factor to consider is the purchase and scoring costs associated with the use of an externally developed instrument-relative to the anticipated cost (in time and money) to the college if it developed and scored its own instrument. Supporting the latter approach are the results of one college-based review of assessment methods in which it was concluded that the cost of externally developed surveys make them viable only for "one-time projects or single, annual projects, and only if campus-based expertise is unavailable" (Malaney & Weitzer, 1993, p. 126).
Peter Ewell, a nationally-recognized assessment scholar, warns about another subtle disadvantage associated with what he calls "off the shelf" (externally developed) instruments.
Off-the-shelf instruments are easy. Although they cost money, it's a lot less than the effort it takes to develop your own. But buying something off-the-shelf means not really engaging the issues that we should-for example, What are we really assessing? and What assumptions are we making about the nature of learning" (cited in Mentkowski et al., 1991, p. 7).
To improve the reliability and validity of campus-specific instruments that are designed internally, a pilot study of the instrument should be conducted on a small sample of students to assess whether the instrument's instructions are clear, the wording of each of its items is unambiguous, and the total time needed to complete the instrument is manageable. As Fidler (1992) notes, "Seasoned researchers recommend pilot studies before every research project in order to identify and quickly alter procedures, thereby alleviating problems before they can invalidate an entire study" (p. 16).
Comments in Support of Above Suggestions
Recommendations Regarding the Construction of the Course-Evaluation Instrument
1. Cluster individual items into logical categories that represent important course objectives or instructional components.
Items that comprise the course-evaluation instrument could be arranged so that the stated objectives of the course and similar objectives are clustered into separate sections. Alternatively, items pertaining to each of the three key components of instruction could be grouped together in separate sections, namely:
- course planning and design (e.g., questions pertaining to overall course organization and clarity of course objectives);
- classroom instruction (e.g., items pertaining to in-class teaching, such as clarity and organization of lectures or instructional presentations); and
- evaluation of student performance (e.g., items pertaining to the fairness of tests, assignments, grading practices, and the quality of feedback provided by the instructor).
A healthy balance of questions pertaining to both course content (topics and subtopics) and instructional process (in-class and out-of-class learning activities) could also be included and clustered together on the evaluation form.
A major advantage of this organizational strategy is that the separate sections or categories can function as signposts or retrieval cues for the designers of the survey, ensuring that the items selected for inclusion in the instrument reflect a well-balanced sample of the major course dimensions that affect the quality of the students' learning experience.
Another advantage of grouping items under section headings is that they can function as cues or signals to students completing the instrument that there are distinct dimensions to the course. This may help them to discriminate among important components of course effectiveness, thereby increasing the likelihood that they will assess them independently.
Lastly, partitioning the instrument into separate sections that reflect separate course dimensions should help to reduce the risk of a general "halo effect," i.e., the tendency for a student to complete the evaluation instrument by giving each item the same positive or negative score (such as all “1s” or “5s”).
2. Provide a rating scale that allows five-to-seven choice points or response options.
There is research evidence which suggests that fewer than five choices reduces the instrument's ability to discriminate between satisfied and dissatisfied respondents, and more than seven rating-scale options adds nothing to the instrument's discriminability (Cashin, 1990).
3. If possible, do not include neutral, "don't know," or "not sure" as a response option.
This alternative could generate misleading results because it may be used as an "escape route" by students who do have strong opinions but are reluctant to offer them (Arreola, 1983).
4. Include items that ask students to report their behavior.
Astin (1991) suggests a taxonomy for classifying types of data that may be collected in the assessment process, which includes two broad categories:
- psychological data reflecting students' internal states, and
- behavioral data reflecting students' activities.
Traditionally, student course evaluations have focused almost exclusively on the gathering of psychological data (student perceptions or opinions). However, given that one of the major goals of most new-student seminars is to increase students' actual use of campus services and student involvement in campus life (Barefoot & Fidler, 1996), items which generate behavioral data pertaining to use of campus services, or frequency of participation in co-curricular activities, should also be included on the evaluation instrument.
5. Ask for comments and leave space for them after each question.
Written comments often serve to clarify or elucidate numerical ratings, and instructors frequently report that written comments are most useful for course-improvement purposes, especially if such comments are specific (Seldin, 1992). As Jacobi (1991) points out, "The typical survey consists of a majority of closed-ended items, with limited opportunities for open-ended responses. This format does not encourage students to explore their attitudes, feelings, or experiences in depth and therefore may provide incomplete information about why students think, feel, or behave in a particular manner" (p. 196).
Allowing students to write comments with respect to each individual item, rather than restricting them to the usual "general comments" section at the very end of the evaluation form, should also serve to increase the specificity of students' written remarks and, consequently, their utility for course or program improvement.
6.Include at least two global items on the evaluation instrument pertaining to overall course effectiveness or course impact.
The following statements illustrate global items that are useful for summative evaluation:
- I would rate the overall quality of this course as:
(poor) - (excellent).
- I would rate the general usefulness of this course as:
(very low) - (very high).
- I would recommend this course to other first-year
students: (strongly agree) - (strongly disagree).
Responses to these global items can provide an effective and convenient summary or “summative” snapshot of students' overall evaluation of the course that can be readily used in program assessment reports. Research has repeatedly shown that these global ratings are more predictive of student learning than student ratings given to individual survey items pertaining to specific aspects or dimensions of course instruction (Braskamp & Ory, 1994; Centra, 1993; Cohen, 1986). As Cashin (1990) puts it, global items function "like a final course grade" (p. 2).
Abrami (1989) argues further, "it does make conceptual and empirical sense to make summative decisions about teaching using a unidimensional [global] rating. This choice then frees us to recognize that the particular characteristics of effective teaching vary across instructors" (p. 227). Thus, ratings on such unidimensional or global items may be used to make summative (overall) assessments of the course or instructor. In contrast, it is not a valid practice to add up student ratings for all individual items on the questionnaire and then simply average them in order to obtain an overall evaluation of the course. This procedure not only is inefficient, it is also an ineffective index of overall course satisfaction because it gratuitously assumes that each individual item carries equal weight in shaping the students' overall evaluation of the course.
Inclusion of global items on the evaluation instrument not only provides a valid snapshot of course effectiveness, it also allows for the examination of relationships between students' overall course ratings and their ratings on individual items pertaining to specific course dimensions. The inclusion of both global and specific items could reveal those specific aspects or dimensions of the course that carry the most weight in determining students' overall perceptions and their overall level of satisfaction with the course, and may also represent key target areas for course improvement.
7.Include an open-ended question asking for written comments about the course's strengths and weaknesses, and how the latter may be improved or rectified.
Such questions can often provide useful information about students' general reaction to the course as well as specific suggestions for course improvement. For example, in a new-student seminar, students could be asked to provide a written response to a question that asks them to "describe a major change (if any) in their approach to the college experience that resulted from their participation in the course." Or, students could be asked, "Was there anything important to learn about being a successful student that was not addressed in the course?" The written responses to these questions provided by students in separate class sections could be aggregated and their content analyzed to identify recurrent themes or response categories. (This is an example of how qualitative data can be gathered simultaneously along with the usual quantitative data generated by student-ratings instruments.)
8.Provide some space at the end of the evaluation form so that individual instructors can add their own questions (Seldin, 1993)
This practice enables instructors to assess specific instructional practices that are unique to their own course. Also, this option should serve to give instructors some sense of personal control or ownership of the evaluation instrument that, in turn, may increase their motivation to use the results in a constructive fashion.
9.Give students the opportunity to suggest questions that they think should be included on the evaluation form.
This opportunity could be cued by a prompt at the end of the evaluation form, such as, "Suggested Questions for Future Evaluations." This practice has three major advantages:
- It may identify student perspectives and concerns that the evaluation form failed to address,
- It shows respect for student input, and
- It gives students some sense of control or ownership of the evaluation process.
Recommendations Regarding Wording/Phrasing of Individual Items
1.Avoid generic expressions that require a high level of inference or interpretation, which may have different meaning for different students (for example, "Pacing of class presentations was 'appropriate'.")
When soliciting information on the incidence or frequency of an experienced event (e.g., "How often have you seen your advisor this semester?"), provide response options that represent specific numbers or a specific numerical range (e.g., 0, 1-2, 3-4, 5 or more), rather than adverbial descriptors (e.g., "rarely"-"occasionally"-frequently").
2. When soliciting information on the incidence or frequency of an experienced event (e.g., "How often have you seen your advisor this semester?"), avoid response options that require high levels of inference on the part of the reader (e.g., "rarely"-"occasionally"-frequently").
Instead, provide options in the form of numbers or frequency counts that require less inference or interpretation by the reader (e.g., 0, 1-2, 3-4, 5 or more times). This practice should serve to reduce the likelihood that individual students will interpret the meaning of response options in different ways.
3. When asking students to rate their degree of involvement or satisfaction with a campus support service or student activity, be sure to include a zero or "not used" option.
This response alternative allows a valid choice for those students who may have never experienced the service or activity in question (Astin, 1991).
4. Use the singular, first person pronoun ("I" or "me") rather than the third-person plural ("students").
For instance, "The instructor gave me effective feedback on how I could improve my academic performance" should be used rather than, "The instructor gave students effective feedback on how they could improve their performance." The rationale underlying this recommendation is that an individual student can make a valid personal judgment with respect to her own course experiences but she is not in a position to judge and report how other students, or students in general, perceive the course.
5. Avoid compound sentences that ask the student to rate two different aspects of the course simultaneously (e.g., "Assignments were challenging and fair.")
This practice forces the respondents to give the same rating to both aspects of the course, even if they are more satisfied with one aspect than the other. For example, a student may feel the assignments were very "fair," but not very "challenging."
6. Include one or two negatively worded items that require students to reverse the rating scale (e.g., "I did not receive effective feedback on how I could improve my performance.").
Such items serve two purposes:
- They encourage students to read and rate each item carefully, serving to reduce the frequency of "positive-response-set" mistakes (Arreola & Aleamoni, 1990) which can occur when the respondent goes straight down a rating column and fills in a uniformly high rating for all items
- They could be used to identify evaluations forms that have not been completed carefully and may need to be deleted from the data analysis. For example, students who have responded thoughtlessly by filling in all positive or all negative ratings may be identified by their failure to reverse their response bias on the negative-worded item(s).
Recommendations for Administration of Course Evaluations
1. Across all sections of a given course, standardize the amount of time allotted for students to complete their evaluations.
Some consensus should be reached among course instructors regarding the minimal amount of time needed for students to complete evaluations in a thoughtful fashion. Further temporal standardization could be achieved if instructors would agree on the particular time during the class period (e.g., the first or last 15 minutes of class) would be best for administering course evaluations. One argument against the common practice of having students complete their evaluations toward the end of a class period is that it could result in less carefully completed evaluations because students may be tempted to finish quickly and leave early.
2. Take into consideration the academic term or semester when course evaluations are to be administered.
One option is to administer the evaluations immediately after the final exam of the course. This provides two advantages:
- It allows students to truly assess the whole course because the final exam represents its last key component.
- Students are not likely to be absent on the day of the final exam, so a larger and more representative sample of students would be present to complete the course evaluation than if it were administered on a routine day of class.
However, a major disadvantage of administering evaluations immediately after students complete the final exam is that they are more likely to be preoccupied, anxious, or fatigued by the just-completed exam. This may result in evaluations that are filled out more hurriedly, with fewer written comments, and less overall validity. This disadvantage is a major one and probably outweighs the advantages associated with having students evaluate the course after completing the final exam.
Perhaps the best approach is for course instructors to agree to administer the evaluation instrument as close to the end of the course as possible (e.g., during the last week of the term), but not immediately after the final exam. Also, this approach would better accommodate those instructors who elect not to administer a final examination in the course.
Click here for further information regarding the timing of evaluation administration as it relates to course improvement.
3. Be aware of the “burn-out,” or fatigue factor.
One last consideration with respect to the issue of when course evaluations are administered is the "burn-out" or fatigue factor that may come into play if students are repeatedly required to fill out course evaluations in all their classes during at the same time in the semester. To minimize the adverse impact of fatigue or boredom that may accompany completion of multiple course evaluations, it might be advisable to try to administer the course-evaluation instrument to students at a time that is not concurrent with administration of other course evaluation forms.
4. Across all sections of a given course, standardize the instructions read to students immediately before distribution of the evaluation forms.
Some research has shown that student ratings can be affected by the wording of instructions that are read to students just prior to administration of the evaluation instrument (Pasen, et al., 1978). For instance, students tend to provide more favorable or lenient ratings if the instructions indicate that the evaluation results will be used for decisions about the instructor's "retention and promotion," as opposed to students being told that the results will be used for the purpose of "course improvement" or "instructional improvement" (Braskamp & Ory, 1994; Feldman, 1979). Thus, instructions read to students in different sections of the course should be consistent (e.g., the same set of typewritten instructions read in each class).
5. Use instructions read prior to distribution of evaluation forms to prepare or prime students for the important role they play in evaluating college courses and to provide students with a positive "mental set".
To increase student enthusiasm for course evaluation and to improve the validity of the results obtained, include some or all of the following information in the instructions read to students prior to course evaluation.
- Remind students that evaluating the course is an opportunity for them to provide meaningful input that could improve the quality of the course for many future generations of first-year students.
- Explain to students why the evaluations are being conducted-for example, to help instructors improve their teaching and to improve the quality of the course. If items relating to specific course characteristics are to be used for instructional improvement purposes and global items for overall course-evaluation or instructor-evaluation purposes, then this distinction should be mentioned in the instructions.
- Assure students that their evaluations will be read carefully and taken seriously by the program director as well as the course instructor.
- Acknowledge to students that, although they may be completing numerous evaluations for all courses they are taking, they should still try to take the time and effort to complete the course-evaluation form thoughtfully and thoroughly.
- Remind students that they should avoid the temptation to give uniformly high or uniformly low ratings on every item, depending on whether they generally liked or disliked the course or the course instructor. Instead, remind them to respond to each item independently and honestly.
- Encourage students to provide written comments in order to clarify or justify their numerical ratings, and emphasize that specific comments are especially welcome because they often provide instructors with valuable feedback on course strengths and useful ideas for overcoming course weaknesses.
- Inform students about what will be done with their evaluations once they have completed them, assuring them that their instructor will not see their evaluation before grades have been turned in (Ory, 1990), and that their hand-written comments will be converted into typewritten form before they are returned to the instructor. (The latter assurance can alleviate students' fear that the instructor will recognize their handwriting; without this assurance, students may be inhibited about writing any negative comments on the evaluation form).
6. If possible, standardize the behavior of instructors during the time when students complete their evaluations.
The importance of this practice is supported by research indicating that student ratings tend to be higher when the instructor remains in the room while students complete the course-evaluation form (Centra, 1993; Feldman, 1989; Marsh & Dunkin, 1992). The simplest and most direct way to eliminate this potential bias is for the instructor to be out of the room while students complete their evaluations (Seldin, 1993). This would require someone other than the instructor to administer the evaluations, such as a student government representative or a staff member.
Some faculty might resist this procedure, particularly if there is no precedent for it at the college. If such resistance is extreme and widespread, even after a clear rationale has been provided, then the following alternatives might be considered:
- The instructor asks a student to distribute the forms and then the instructor leaves the room while students complete their evaluations.
- The instructor stays in the room but does not circulate among the class while students complete their course evaluations; instead he or she remains seated at a distance from the students (e.g., at a desk in front of the class) until all evaluations have been completed.
Whatever the procedure used, the bottom line is that variations in how instructors behave while students complete course evaluations should be minimized. Instructor behavior during course evaluation is a variable that needs to be held constant so it does not unduly influence or "contaminate" the validity of student evaluations of the course.
Recommendations for Analyzing, Summarizing & Reporting the Results of Course Evaluations
1. Report both the central tendency and variability of students' course ratings.
Two key descriptive statistics can effectively summarize student ratings:
- Mean (average) rating per item, which summarizes the central tendency of student ratings, and
- standard deviation (SD) per item, which summarizes the variation or spread of student ratings for each item. Theall and Franklin (1991) succinctly capture the meaning and significance of including standard deviation in the analysis and summary of students' course ratings
In addition to computing the means and SDs for student ratings received by individual instructors in their own course sections, these statistics can also be computed for all class sections combined, thereby allowing individual instructors to compare the mean and SD score for ratings in their own section with the composite mean and standard deviation calculated for all sections. Computing section-specific and across-section (composite) means and SDs for each item on the evaluation instrument also allow for the application of statistical tests to detect significant differences between the instructor's section-specific ratings and the average rating of all course sections combined. The results of these significance tests could provide valuable information that can be used diagnose instructional improvement. For instance, if an instructor's rating on an item is significantly below the collective mean for that item, it may suggest to the instructor that this is one aspect of his instruction that needs closer attention and further development. In contrast, if an instructor's mean rating on a given item is significantly above the overall mean for all course sections on that item, then this discrepancy suggests an instructional strength with respect to that particular course characteristic. What the instructor is doing to garner such a comparatively high rating might be identified and shared with other faculty who are teaching the course.
2. The identification and sharing of strategies for instructional improvement should be an essential component of course assessment, and it is a form of feedback that has been commonly ignored or overlooked when student ratings are used to evaluate college courses (Stevens, 1987; Cohen, 1990).
As Stevens contends, "The instructor must learn how to design and implement alternative instructional procedures in response to feedback, which means that a coherent system of instructional resources must be easily available to the instructor. Without such a system, the instructor may be unable to gain the knowledge or support that is necessary to effect change" (1987, p. 37).
One non-threatening way to provide course instructors with specific strategies for instructional improvement is to create opportunities for instructors to share concrete teaching practices that have worked for them. Strategies could be solicited specifically for each item on the evaluation form and a compendium of item-specific strategies could then be sent to all instructors-ideally, at the same time they receive the results of their course evaluations. In this fashion, instructors are not only provided with a descriptive summary of student-evaluation results, but also with a prescriptive summary of specific strategies about what they can do to improve their instructional performance with respect to each item on the evaluation instrument. Moreover, involving course instructors in the development and construction of these strategies serves to
- actively engage them in the quest to improve course instruction,
- treats them like responsible agents (rather than passive pawns) in the assessment process.
- increases their sense of ownership or control of the course-evaluation process, and
As Paul Dressel recommends, "Evaluation done with or for those involved in a program is psychologically more acceptable than evaluation done to them" (1976, p. 5).
The importance of providing specific teaching-improvement feedback to course instructors is underscored by research indicating that
- instructors prefer feedback that is specific and focused on concrete teaching behaviors (Murray, 1987; Brinko, 1993), and
- specific feedback is more effective for helping recipients understand their evaluation results and for helping them to improve their instructional performance (Goldschmid, 1978; Brinko, 1993).
As Wilson (1986) concluded following his extensive research on the effects of teaching consultation for improving instructors' course evaluations: "Items on which the greatest number of faculty showed statistically important change were those for which the suggestions were most concrete, specific and behavioral" (p. 209).
3. To maximize the opportunity for instructors to make instructional improvements while the course is still in progress, it is recommended that course evaluations be administered at midterm to obtain early feedback.
Cohen (1980) conducted a meta-analysis of 17 studies on the effectiveness of student-rating feedback for improving course instruction. He found that receiving feedback from student ratings during the first half of the semester was positively correlated with instructional improvement-as measured by the difference in student ratings received at midterms (before feedback was received), and ratings received at the end of the semester (after midterm feedback had been received). These findings are consistent with those reported by Murray and Smith (1989), who found that graduate teaching assistants in three different disciplines who received instructional feedback at midterms displayed higher pre-to post-test gains in student ratings than a control group of teaching assistants who did not receive midterm feedback.
Course instructors could initiate this early-feedback process if they administer student evaluations at midterm and then compare these results with those obtained at the end of the course-after some instructional change was made in response to students' midterm feedback. Thus, pre-midterm to post-midterm gain in students ratings may be attributed to the particular instructional change that was implemented during the second half of the course. This is an example of the type of "classroom research" which has been strongly endorsed as a legitimate form of faculty scholarship (Boyer, 1991) and which serves to integrate educational research with instructional practice (Cross & Angelo, 1988).
Click here for additional information regarding the timing of evaluation administration and its impact on students.
4. Compare evaluations to those provided for other first-year courses to gain a reference point for interpreting student perceptions of the course.
To ensure a fair basis of comparison and a valid reference point, compare student evaluations of the course with other courses of similar class size (e.g., a first-year course in English composition) because there is some evidence that class size can influence student ratings, with smaller classes tending to receive slightly higher average ratings than larger classes (Cashin, 1988; Feldman, 1984).
Also, depending on whether the course is required or an elective, it should be compared with other first-semester courses that have the same required or elective status, because research suggests that required courses tend to receive lower student ratings than elective courses (Braskamp & Ory, 1994; Marsh & Dunkin, 1992).
5. Surveys or questionnaires could also be used to obtain a different type of comparative perspective on the course-the retrospective perceptions of course alumni.
For example, new-student seminars often emphasizes lifelong-learning and life-adjustment skills, so it might be revealing to assess how upper-division students or college alumni, reflecting back on the course, would respond to the following questions posed to them on a course survey:
- Do you view the seminar differently now than you did when you were a first-year student?
- What aspect of the seminar is most memorable or has had the most long-lasting impact on you?
- Do you still use any ideas or skills acquired during the new-student seminar in your educational, professional, or personal life?
6. Administer the course-evaluation instrument to students before the class begins and re-administer it at course completion to gain perspective on student changes in attitude or behavior between the start and finish of the course.
To assess change in students' attitudes, reported behaviors, or academic-skill performance between the beginning and end of the course, student responses given at the end of the course can be compared to those given before the course begins. This pre/post design can be created by administering an evaluation instrument, or selected items therefrom, to students on the first day of class so these responses can be used as a baseline (pre-test) against which their post-course (post-test) responses can be compared.
To increase the likelihood that pre- to post-course changes in student attitudes or behavior can be attributed to the course, rather than to personal maturation over time or to the college experience in general, students' pre- and post-course responses could be compared with the responses provided by other first-year students (at the beginning and end of the same semester) who do not participate in the course.
Applying a pre/post design to both course participants (treatment group) and non-participants (control group) provides two bases of comparison for assessing course impact:
- between-groups (treatment-control) comparison, and
- between points-in-time (longitudinal) comparison
Adding a control group of non-participants to the pre/post design creates a more powerful research design that can increase the probability and validity of drawing causal (cause-effect) inferences about course impact on student behavior or attitudes because it controls for the confounding effects of pre- to post-course change that may simply occur to maturation. As Pascarella and Terenzini (1991) point out with respect to the impact of college on students,
[In] the simple pretest-postest longitudinal design, the same panel or sample of students is followed over a specified period of time and measured on the same instrument. The students are essentially their own control group, and the difference between mean freshman and senior scores on some measure of interest is used as an estimate of college. Unfortunately, such mean changes may reflect not only the influence of college but also the effects of confounding noncollege influences. The most troublesome confounding variable associated with simple longitudinal panel designs having no control group is that of age or maturation (p. 661).
If the pre/post design includes a sample of first-year students of the same age who do not experience the course, this group they may serve as an effective control for the confounding effects of maturation, thereby creating a viable methodological tool for assessing the potential causal influence of a first-year course on student outcomes.
Abrami, P. C. (1989). How should we use student ratings to evaluate teaching? Research in Higher Education, 30(2), 221-227.
Abrami, P. C., d'Apollonia, S., & Rosenfield, S. (1997). The dimensionality of student ratings of instruction: What we know and what we do not. In J. Smart (Ed.), Higher education: Handbook of theory and research, Vol. II. New York: Agathon Press.
Abrami, P. C., Perry, R. P., & Leventhal, L. (1982). The relationship between student personality characteristics, teacher ratings, and student achievement. Journal of Educational Psychology, 74(1), 111-125.
Aleamoni, L. M. (1987). Techniques for evaluating and improving instruction. New Directions for Teaching and Learning, No. 31. San Francisco: Jossey-Bass.
Aleamoni, L. M. & Hexner, P. Z. (1980). A review of the research on student evaluation and a report on the effect of different sets of instructions on student course and instructor evaluation. Instructional Science, 9(1), 67-84.
Arreola, R. A. (1983). Establishing successful faculty evaluation and development programs. In A. Smith (Ed.), Evaluating faculty and staff. New Directions for Community Colleges, No. 41. San Francisco: Jossey-Bass.
Arreola, R. A. & Aleamoni, L. M. (1990). Practical decisions in developing and operating a faculty evaluation system. In M.Theall & J. Franklin (Eds.), Student ratings of instruction: Issues for improving practice (pp. 37-56). New Directions for Teaching and Learning, No. 43. San Francisco: Jossey-Bass.
Astin, A. W. (1991). Assessment for excellence: The philosophy and practice of assessment and evaluation in higher education. New York: Macmillan.
Barefoot, B. O. (Ed.) (1993). Exploring the evidence: Reporting outcomes of freshman seminars. (Monograph No. 11). Columbia, SC: National Resource Center for The Freshman Year Experience, University of South Carolina.
Barefoot, B. O., & Fidler, P. P. (1992). Helping students climb the ladder: 1991 national survey of freshman seminar programs. (Monograph No. 10). Columbia, SC: National Resource Center for The Freshman Year Experience, University of South Carolina.
Barefoot, B. O., & Fidler, P. P. (1996). The 1994 survey of freshman seminar programs: Continuing innovations in the collegiate curriculum. (Monograph No. 20). National Resource Center for The Freshman Year Experience & Students in Transition, University of South Carolina.
Boyer, E. L. (1991). Scholarship reconsidered: Priorities of the professoriate. Princeton, NJ: Carnegie Foundation for the Advancement of Teaching.
Braskamp, L. A., & Ory, J. C. (1994). Assessing faculty work: Enhancing individual and institutional performance. San Francisco: Jossey-Bass.
Brinko, K. T. (1993). The practice of giving feedback to improve teaching: What is effective? Journal of Higher Education, 64(5), 574-593.
Brower, A. M. (1990). Student perceptions of life task demands as a mediator in the freshman year experience. Journal of the Freshman Year Experience, 2(2), 7-30.
Brower, A. M. (1994). Measuring student performances and performance appraisals with the College Life Task Assessment instrument. Journal of the Freshman Year Experience, 6(2), 7-36.
Cashin, W. E. (1988). Student ratings of teaching: A summary of the research. IDEA Paper No. 20. Manhattan, Kansas: Kansas State University, Center for Faculty Evaluation and Development. (ERIC Document Reproduction No. ED 302 567).
Cashin, W. E. (1990). Students do rate different academic fields differently. In M. Theall, & J. Franklin (Eds.), Student ratings of instruction: Issues for improving practice (pp. 113-121). New Directions for Teaching and Learning, No. 43. San Francisco: Jossey Bass.
Centra, J. A. (1977). Student ratings of instruction and their relationship to student learning. American Educational Research Journal, 14(1), 17-24.
Centra, J. A. (1993). Reflective faculty evaluation: Enhancing teaching and determining faculty effectiveness. San Francisco: Jossey-Bass.
Cohen, P. A. (1981). Student ratings of instruction and student achievement: A Meta-analysis of multisection validity studies. Review of Educational Research, 51(3), 281-309.
Cohen, P. A. (1986). An updated and expanded meta-analysis of multisection student rating validity studies. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, 1986.
Cohen, P. A. (1990). Bringing research into practice. In M. Theall & J. Franklin (Eds.), Student ratings of instruction: Issues for improving practice. New Directions for Teaching and Learning, No. 43. San Francisco: Jossey-Bass.
Costin, F., Greenough, W, & Menges, R. (1971). Student rated college teaching: Reliability, validity, and usefulness. Review of Educational Research, 41(5), 511-535.
Cross, K. P., & Angelo, T. A. (1988). Classroom assessment techniques: A handbook for faculty. National Center for Research to Improve Postsecondary Teaching and Learning. Ann Arbor: University of Michigan.
d'Apollonia, S., & Abrami, P. C. (1997). In response . . . . Change, 29(5), pp. 18-19.
Dressel, P. L. (1976). Handbook of academic evaluation. San Francisco: Jossey-Bass.
Feldman, K. A. (1977). Consistency and variability among college students in rating their teachers and courses: A review and analysis. Research in Higher Education, 6(3), 233-274.
Feldman, K. A. (1979). The significance of circumstances for college students' ratings of their teachers and courses. Research in Higher Education, 10(2), 149-172.
Feldman, K. A. (1984). Class size and college students' evaluations of teachers and courses: A closer look. Research in Higher Education, 21(1), 45-116.
Feldman, K. A. (1988). Effective college teaching from the students' and faculty's view: Matched or mismatched priorities? Research in Higher Education, 28(4), 291-344.
Feldman, K. A. (1989). Instructional effectiveness of college teachers as judged by teachers themselves, current and former students, colleagues, administrators, and external (neutral) observers. Research in Higher Education, 30(2), 137-194.
Fidler, D. S. (1992). Primer for research on the freshman year experience. National Resource Center for The Freshman Year Experience, University of South Carolina.
Hartman, N. A., & former University 101 students (1991, February). Celebrating the freshman year: A retrospection. Presentation made at the annual conference of The Freshman Year Experience, Columbia, South Carolina.
Hays, N. L. (1973). Statistics for the social sciences (2nd ed.).New York: Holt, Rinehart, & Winston.
Howard, G. S., & Maxwell, S. E. (1980). Correlation between student satisfaction and grades: A case of mistaken causation. Journal of Educational Psychology, 72(6), 810-820.
Howard, G. S., & Maxwell, S. E. (1982). Do grades contaminate student evaluations of instruction? Research in Higher Education, 16, 175-188.
Jacobi, M. (1991). Focus group research: A tool for the student affairs' professional. NASPA Journal, 28(3), 195-201
Knapp, J., & Sharon, A. (1975). A compendium of assessment techniques. Princeton, N.J.: Educational Testing Service.
Malaney, G. D., & Weitzer, W. H. (1993). Research on students: A framework of methods based on cost and expertise. NASPA Journal, 30(2), 126-137.
Marsh, H. W. (1984). Students' evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and utility. Journal of Educational Psychology, 76(5), 707-754.
Marsh, H. W., & Dunkin, M. (1992). Students' evaluations of university teaching: A multidimensional perspective. In J.C.Smart (Ed.), Higher education: Handbook of theory and research (Vol. 8, pp. 143-233). New York: Agathon.
Marsh, H. W., & Ware, J. E., Jr. (1982). Effects of expressiveness, content coverage and incentive on multidimensional student rating scales: New interpretations of the Dr. Fox effect. Journal of Educational Psychology, 74(1), 126-134.
McCallum, L. W. (1984). A meta-analysis of course evaluation data and its use in the tenure decision. Research in Higher Education, 21(2), 150-158.
McKeachie, W. J. (1979). Student ratings of faculty: A reprise. Academe, 65(6), 384-397.
McKeachie, W. J., & Kaplan, M. (1996). Persistent problems in evaluating college teaching. AAHE Bulletin, 48(6), pp. 5-8.
Mentkowski, M., Astin, A. W., Ewell, P. T., & Moran, E. T. (1991). Catching theory up with practice: Conceptual frameworks for assessment. The AAHE Assessment Forum. Washington, D.C.: American Association for Higher Education.
Murray, H. G. (1987, April). Impact of student instructional ratings on quality of teaching in higher education. Paper presented at the 71st annual meeting of the American Educational Research Association, Washington, D.C.
Murray, H. G., & Smith, T. A. (1989, March). Effects of midterm behavioral feedback on end-of-term ratings of instructional effectiveness. Paper presented at the annual conference of the American Educational Research Association, San Francisco.
Ory, J. C. (1990). Student ratings of instruction: Ethics and practice. In M. Theall & J. Franklin (Eds.), Student ratings of instruction: Issues for improving practice (pp. 63-74). New Directions for Teaching and Learning, No. 43. San Francisco: Jossey-Bass.
Overall, J. U., & Marsh, H. W. (1980). Students' evaluations of instruction: A longitudinal study of their stability. Journal of Educational Psychology, 72(3), 321-325.
Pace, C. R. (1990). College student experiences questionnaire (3rd edition). Los Angeles: University of California, Los Angeles, Center for the Study of Evaluation. (Available from the Center for Postsecondary Research and Planning, Indiana University).
Pace, C. R., & Kuh, G. D. (1998). College student experiences questionnaire (4th edition). Center for Postsecondary Research and Planning. Bloomington: Indiana University.
Pascarella, E. T., & Terenzini, P. T. (1991). How college affects students: Findings and insights from twenty years of research. San Francisco: Jossey-Bass.
Pasen, R. M., Frey, P. W., Menges, R. J., & Rath, G. (1978). Different administrative directions and student ratings of instruction: Cognitive vs. affective effects. Research in Higher Education, 9(2), 161-167.
Robinson, P. W., & Foster, D. F. (1979). Experimental psychology: A small-N approach. New York: Harper & Row.
Seldin, P. (1992). Evaluating teaching: New lessons learned. Keynote address presented at "Evaluating Teaching: More Than a Grade" conference held at the University of Wisconsin-Madison, sponsored by the University of Wisconsin System, Undergraduate Teaching Improvement Council.
Seldin, P. (1993). How colleges evaluate professors, 1983 vs. 1993. AAHE Bulletin, 46(2), pp. 6-8, 12.
Sixbury, G. R., & Cashin, W. E. (1995). IDEA technical report no. 9: Description of database for the IDEA Diagnostic Form. Manhattan, KS: Kansas State University, Center for Faculty Evaluation and Development.
Stevens, J. J. (1987). Using student ratings to improve instruction. In K. M. Aleamoni (Ed.), Techniques for evaluating and improving instruction (pp. 33-38). New Directions for Teaching and Learning, No. 31. San Francisco: Jossey-Bass.
Theall, M., & Franklin, J. (1991). Using student ratings for teaching improvement. In M. Theall & J. Franklin (Eds.), Effective practices for improving teaching (pp. 83-96). New Directions for Teaching and Learning, No. 48. San Francisco: Jossey-Bass.
Theall, M., Franklin, J., & Ludlow, L. H. (1990). Attributions and retributions: Student ratings and the perceived causes of performance. Paper presented at the annual meeting of the American Educational Research Association, Boston.
Tregarthen, T., Staley, R. S., & Staley, C. (1994, July). A new freshman seminar course at a small commuter campus. Paper presented at the Seventh International Conference on The First Year Experience, Dublin, Ireland.
Turner, J. C., Garrison, C. Z., Korpita, E., Waller, J., Addy, C., Hill, W. R., & Mohn, L. A. (1994). Promoting responsible sexual behavior through a college freshman seminar. Aids Education and Prevention, 6(3), 266-277.
Wilson, R. C. (1986). Improving faculty teaching: Effective use of student evaluations and consultants. Journal of Higher Education, 57(2), 196-211.