A r c h i v e d  I n f o r m a t i o n

Strengthening the Standards: Recommendations for OERI Peer Review - October 1999

IV. Findings

This chapter summarizes study findings in three areas: the substantive fit between reviewers and applications; the quality of the peer reviews; and a more in-depth examination of the review process of six panels, including 4 fiscal year (FY) 1997 field-initiated studies (FIS) competition panels nominated by institute staff as particularly successful and two center competition panels—one nominated as successful and one as problematic.

Substantive Fit Between

Reviewers and Applications

Methods

To determine how well the peer reviewers reflected the intent of the standards, we conducted an exercise to match application content, theory, and methods with the background and experience of the individuals who conducted the reviews. Because of time and data collection constraints, data on peer reviewers' credentials were limited to the resumes they submitted to the Office of Educational Research and Improvement (OERI) at the time of the reviews. While it would have been desirable to obtain additional and more detailed background information, in many, if not most instances, those resumes were also the main data available to the OERI staff in making reviewer selections. In a few cases, the documentation submitted by the reviewer was not a resume (e.g., a press release or bio) or was an abbreviated resume.

For each of the panels and applications selected through the stratified random selection procedure described in chapter 1, we requested from OERI the resumes of all three-peer reviewers. We then examined the applications and peer reviewer credentials (those we obtained) for 12 FIS panels for FY 1997 (29 applications and 35 reviewers) and 6 FIS panels for FY 1996.1 We also examined the fit between applications and reviewer credentials for those five center competitions for which we had the necessary information. We then constructed data displays resulting from this inquiry, in which the content of each application and data on peer reviewers' education, current position, and research experience were briefly summarized. For each application a short statement about the fit between the two was prepared.

In general, we used the following approach in our assessment. If an individual had a doctorate, we looked at the field of the doctorate and the individual's publications. If the field generally required research for a doctorate (e.g., education psychology or education research), we assumed that the individual had a research background. If the field might or might not require original research (e.g., curriculum and instruction, educational administration), we looked at positions held and publications as well. We did the same for individuals without doctorates. Sometimes it was impossible to determine whether an individual had a research background because publications were incomplete or titles did not yield sufficient information, in which case that limitation was noted in the assessment. We also found that it was largely impossible to determine expertise in methods or theory from the resumes. The most we could learn was whether the reviewer had conducted roughly similar types of research in the broad subject area of the competition. We did not attempt a detailed match between reviewer credentials and specific applications. Not only was the data insufficient to permit that type of analysis, but we recognized that to do so would impose a much stricter rule than is suggested in the standards.2

Reviewers were also interviewed regarding possible concerns about serving as reviewers (e.g., lack of knowledge of the subject area and/or methods, conflict of interest, timing of reviews); whether the concerns had been addressed satisfactorily by U.S. Department of Education (ED) staff; the extent to which the subject area of the competition was described in sufficient detail for them to determine whether they were qualified to review applications; and their assessment of their own and their fellow panel members' qualifications for serving as reviewers (e.g., familiarity with subject area, proposed methods, scope of the design).

Findings from the Review of Resumes

While most of the reviewers in the sample had conducted research in education, a sizeable minority had not. We focus here on the FY 1997 FIS reviewers, the group for which the greatest amount of systematic information was available. Of the 35 reviewers on 12 panels whose resumes were reviewed, 17 appeared to be educational researchers, and an additional 6 may well have had research experience, although the resumes for these individuals were insufficient to make that determination (e.g., their resumes showed they had doctorates, but their publications were missing). The remaining 12 individuals (about a third) did not indicate any research experience or publications on their resumes. They included persons who had served as teachers, school administrators, state officials, tribal officials, teacher trainers, and university administrators. Most had a solid background in education policy or practice and familiarity with the general subjects being reviewed (early childhood education or science education), but did not meet the criterion of having studied and conducted research in the general field in which they were reviewing applications.

Among the 23 individuals who had or may have had research training and experience, most had that experience in broad areas related to the competitions. For example, individuals with backgrounds in early childhood education were likely to review applications in that area. Furthermore, subject area fit often extended to a more detailed level, for example, with individuals knowledgeable about science education reviewing science education applications within a larger field (student achievement). This level of fit was not always the case with respect to methods, however. Thus, for example, individuals who had little or no experience in studying large-scale program or policy implementation evaluated applications aimed at studying the longitudinal effects of curriculum or policy reforms.

Across institutes, perhaps the most common area in which research experience appeared to be lacking was the design and conduct of evaluations. Of the 29 applications we matched with reviewers, 10 were evaluation studies in whole or in part. These applications proposed studies ranging from small-scale experimental design tests of new curricula to large-scale testing of interventions with nationally representative samples of children, teachers, or others. Judging from their experience and publications, few of the individuals who reviewed those applications appeared to have conducted evaluations themselves, let alone experiments or studies requiring elaborate sampling designs. The reviewers selected were likely to be familiar with the subject of the evaluation (early childhood education or at-risk youth), but they did not appear to be as familiar with the types of studies commonly proposed—studies using experimental or quasi-experimental designs to conduct evaluations.

Saying that up to 23 of 35 reviewers had training and/or experience in the research areas of the competition does not imply that they were all experts in those fields, however. Of the 17 who were clearly education researchers, 2 indicated that their Ph.D.s were (or would be) earned in 1997, meaning they may not have had those degrees at the time they reviewed the applications. In addition, several of the reviewers with research experience had only limited amounts of experience in research; we counted as researchers individuals with a research dissertation or a few research publications. The standards speak of "expertise," but it is impossible to make that assessment from resumes.

Most of the panels for the center competitions for which resumes were available appeared to be qualified to review the center applications. Not all the members of each panel held doctorates or had conducted research in the competition subject area, but most on each panel met these criteria. The exceptions among the five panels were: the Postsecondary Improvement Center, for which most of the reviewers had conducted some research, but only a minority appeared to have studied reform in postsecondary education (although two had extensive backgrounds in research on occupational training); and the Adult Literacy Center, for which only a minority of the reviewers had either doctorates or research experience. We were provided six of the nine reviewers' resumes, so it is possible that the data on the missing reviewers would alter this conclusion.

Findings from the Interviews

Of the 14 reviewers surveyed who had participated in the FY 1996 FIS competition, 12 had no concerns about serving as a reviewer at any time in the process. One had concerns initially about serving because of a potential conflict of interest, but those concerns were adequately addressed by ED staff. One reviewer was concerned about his/her lack of knowledge in the specific subject area of the competition and about the timing of the competition (not having adequate time to prepare for the review), but was told by the Department that there was no problem with regard to subject matter knowledge because he/she had been recruited for policy rather than specific subject matter expertise.

All but 1 of the 14 reviewers said the subject area was described in sufficient detail for them to make a determination as to whether they were qualified to serve.

With regard to reviewers' assessment of their own expertise, 13 reviewers reported that their expertise was appropriate. One expressed concern because of lack of specific expertise in the subject areas of the range of proposals to be reviewed.

Reviewers' assessment of the expertise of their fellow panelists was mixed. Of the five panelists who participated in panel discussions (FY 1996 was mostly a mail-out review, with panel discussion only for applications with discrepant ratings), two reviewers found their fellow panelists' expertise satisfactory, while three were more critical of the expertise of their fellow reviewers. Of these three, two cited lack of expertise in research design and methodology as their concern. One reviewer stated that a fellow panelist lacked objectivity, had his/her own "agenda," and had not devoted proper time to reviewing applications prior to the panel meeting. One reviewer was concerned because the other reviewers lacked practical experience and were not knowledgeable about the subject area of the applications.

Of the 26 reviewers surveyed who had participated in the FY 1997 FIS competition, 22 had no concerns about serving as a reviewer at any time in the process. Three had concerns initially about serving because of a potential conflict of interest, but their concerns were adequately addressed by ED staff. One reviewer was concerned about his/her lack of research knowledge but was told by ED staff that this was not a problem because at least one other member of the panel would have this expertise.

Almost all reviewers (23) stated that the subject area of the competition was described in sufficient detail for them to make this determination. One reported there was sufficient information once the materials had arrived. One reported there appeared to be enough information initially, but in retrospect (after completing the reviews), he/she realized there was not sufficient information. One stated there was not sufficient detail.

With regard to reviewers' assessment of their own expertise, 16 reviewers reported their expertise was appropriate. Two reported it was appropriate because they were working on a panel with other reviewers who complemented their expertise. Three expressed concerns because of a lack of background in research design and methodology. Three stated that they lacked specific expertise in the subject areas of the range of proposals they had to review. Two reported that their expertise was for the most part appropriate.

As in FY 1996, reviewers' assessment of the expertise of their fellow panelists was mixed. Seventeen reviewers expressed satisfaction. In fact, they described the panel expertise in glowing terms—"a tapestry of expertise that overall covered the area very well," "the team complemented each other," "each brought different strengths," "outstanding team expertise," and "other reviewers had research background that complemented my practical experience." These comments indicate that reviewers made their assessment based on expertise across the three reviewers, rather than on each reviewer's having expertise in all areas (as specified by the standards; see chapter 2). Nine reviewers were more critical of the expertise of their fellow reviewers. Seven of these nine cited a lack of expertise in research design and methodology as their concern. One reviewer was concerned because a fellow panel member had not read the proposals prior to the review, but rated them anyway, and one reviewer provided no reason for his/her assessment of a lack of expertise among other panelists.

Of the 14 reviewers who participated in the center competitions, 9 had no concerns about serving as a reviewer at any time in the process. Five had concerns because of the short lead-time they had been given prior to the review.

All but one of the 14 reviewers said the subject area was described in sufficient detail for them to make this determination. One reported there was insufficient information but prior knowledge of the center activities helped. With regard to reviewer assessment of their expertise, all reviewers reported that their expertise was appropriate.

Reviewers' assessment of the expertise of their fellow panelists on the whole was quite positive. Eleven reviewers were very satisfied with this expertise, and many commented positively on the "variety of points of view, backgrounds, or perspectives of the panel." Three panelists expressed concern about their fellow panelists because of their lack of knowledge of research design and methodology, subject area knowledge, bias, not having read the applications prior to the meeting, or modest participation during the panel meeting.

Evaluation of Reviews

Methods

To determine their quality, we read 141 reviews produced for sampled applications in the 1996 and 1997 FIS competitions, as well as 5 center competitions. We read at least 2, and usually all 3, reviews for each of the sampled FIS applications.3

To guide our reading and assessment of the reviews, we produced a one-page evaluation sheet listing a series of questions about the review:

Are comments concise (i.e., does the reviewer provide brief, precise, specific, and persuasive arguments about the design and methods of the proposed research)?
Does the review make clear whether the research is or is not likely to yield valid and useful information?
Are the comments related to the evaluation criteria?
Are the comments consistent with scores (i.e., high scores are accompanied by largely positive comments, low scores are accompanied by largely negative comments, and mixed comments are provided for mixed scores)?
Are the comments sufficiently elaborated (i.e., is the reviewer's judgment amply and expertly justified)?
Are there any additional comments?

Concise comments, comments related to evaluation criteria, and comments consistent with scores are all required by the standards. We added the additional questions to provide a more complete picture of the quality of the reviews.

It is important to note the specific evaluation factors selected for the grant competitions (see chapter 2). The broad criteria were the same for all competitions and included national significance, the quality of the project design, the quality and potential contribution of personnel, the adequacy of resources, and the quality of the management plan. The specific factors under each criterion were different; they appear in appendix C.

In all competitions, reviewers were asked to provide written comments under each broad criterion, as well as scores. Comments for each criterion were to be divided into "strengths" and "weaknesses." In the hardcopy version of the review form, one page was provided for each broad criterion, with equal space provided for strengths and weaknesses. In addition, a final page, which appeared to be optional,4 gave reviewers an opportunity to make overall comments in support of their recommendations, describing strengths and weaknesses, as well as providing suggestions for improving the project in future submissions.

In addition, applicants were interviewed to determine their assessment of the quality of the reviews they received. They were asked how extensive and useful they found the written comments based on the reviewers' analysis of the strengths and weaknesses of the application with respect to each of the application criteria, and the extent to which reviews demonstrated expertise and familiarity with policy and practice in the field of education, as well as in-depth knowledge of theoretical perspectives or methodological approaches relevant to the subject of the competition.

Findings from the Review of Applications

In terms of breadth of coverage, most reviews met the letter of the standards in that they provided short comments on each broad evaluation criterion, the comments were related to the evaluation criteria, and the scores reflected the comments. To the extent that concise comments were provided, they were most often found under the first two criteria—national significance and design. The other evaluation criteria received much less comment.

Having noted the breadth of the reviews, it is important to note as well that most provided little depth. With respect to both national significance and project design, most reviewer descriptions of application strengths did little more than identify or document what was included in the application, sometimes accompanied by a summary judgment on its quality, sometimes not. In describing application weaknesses, reviewers were more likely to express independent judgment in the area of design that in the area of national significance. For example, 30 of the FIS reviews included a relatively detailed discussion of design weaknesses; most of these comments drew on independent or personal knowledge, not solely on the application. Although there was less detailed description of weaknesses than strengths under national significance, some reviewers considered such weaknesses in detail. Finally, on the whole, other sections of the reviews (staffing, budget, management plan) generated little detailed discussion.

We also categorized reviews as "good," "bad," or "indifferent" based on their breadth and depth of coverage. Of the 79 reviews conducted in FY 1997, about one-third were good: they were detailed assessments of an application's strengths and weaknesses that displayed the reviewer's knowledge as it was brought to bear on the application. Another 20 percent of the reviews could be characterized as poor: they misstated or simply did not reflect an understanding of the application content; were so poorly written that it was impossible to know what positions they were taking or advice they were providing; or ignored the research components of research projects, focusing instead on the application's intervention as a program or demonstration. Indifferent reviews accounted for about half of the reviews we read: they were fragmentary, and they listed elements in the design or in the needs or theory sections of applications as strengths, with little support for that designation—sometimes adding a short judgment, sometimes not. They offered little if any independent insight from the reviewer. These reviews may have stated one or two critical points (i.e., weaknesses) for a given criterion, but those points were often minor or provided information any observant reader could offer about omitted or incomplete items, such as "instruments not included," or "objectives not clearly stated." Indifferent reviews rarely provided guidance to the applicants about how to improve their applications.

Although far fewer applications were reviewed for the FY 1996 FIS, the quality of those reviews appeared to be somewhat better than that of the FY 1997 FIS reviews. Of the 21 reviews we examined, almost half could be considered good by our criteria; the remainder were largely indifferent.

While some center competition reviews were detailed, far too many fell into the indifferent category. About half of the 41 center reviews we examined were relatively detailed; the other half were brief, and provided mainly the same types of short descriptive or normative statements as those described above for the FIS reviews. As with the FIS reviews, the attention to criteria other than national significance and design was minimal. Few reviewers wrote more than perfunctory comments on personnel, resources, and management. Lack of attention to management is of particular concern because centers are often composed of researchers drawn from many institutions and require rigorous management.

Findings from the Interviews

There was a great deal of variability both within panels, across panels in a given institute, and across institutes in applicants' responses to questions posed to them. Nor was there any clear pattern by panel or proposal status—acceptance versus rejection. For this reason, the data is presented by competition only.

Overall, applicant assessment of the reviews was mixed.5 In terms of applicants' assessment of the usefulness and extensiveness of the reviews, 8 applicants (of 34) in the FIS competitions gave the reviews low ratings on this criterion, 16 applicants (of 34) gave reviews mixed ratings, and about 10 (of 34) rated them high. Ratings on the same criterion for center applicants were 9 (of 17) poor, 5 (of 17) mixed, and 3 (of 17) high.

The reasons for the FIS and center applicants' negative or mixed assessments were varied and included disagreement with the comments, comments that were considered superficial or irrelevant, no comments about design, lack of examples, comments that were illegible, limited explanation for comments, proposal not carefully read, large discrepancies among reviewer comments, reviewer comments too similar to each other, and summary statements that did not mesh with comments in individual categories (e.g., technical quality, national significance).

In terms of the extent to which applicants considered that reviewers demonstrated appropriate expertise, 21 (of 34) of all FIS applicants rated reviewers low or mixed on expertise, and 9 (of 34) rated them as high.6 Among center applicants, 14 (of 17) rated reviewers low on expertise and 3 (of 17) as high.

Applicants gave reviewers mixed or poor ratings for a variety of reasons, including failure to understand the significance or value of the proposal; disagreement with reviewer comments about personnel, time, budget, or national significance; comments that reflected poor understanding of research and methodology; comments that did not address information presented in applications; lack of understanding of the substantive focus of applications; lack of understanding of budget, personnel, and management issues, resulting in comments that were inappropriate; and reviewer having his/her own agenda focused on a particular population. The most prevalent concerns, expressed by nine applicants, related to reviewers' lack of understanding of research design and methodology.

Case Studies

Methods

In addition to our overall picture of the competitions, we examined the review process of six panels in greater depth. The focus of this in-depth examination was four FY 1997 FIS competition panels nominated by institute staff as particularly successful and two center competition panels—one nominated as successful and one as problematic.7 This section of the report summarizes the review process in these six cases from the perspectives of reviewers and applicants. In a sense, this discussion highlights what can be learned from panels that OERI staff consider indicative of good practice in the review of applications (the four FIS panels and one center competition were designated as exemplary by OERI staff), as well as problems that have arisen in carrying out the review of grant applications (one center competition was designated as problematic by staff).

Findings

In general, reviewers found the instructional information they received pertaining to the review useful, but most indicated that the review of applications took far longer than the estimates provided by OERI staff. It was not uncommon for reviewers to spend several days or a week or more reviewing 10 FIS or center applications. Most reviewers said they felt they were qualified to review the applications, although a few nonresearchers were unsure in this regard.

Most reviewers found the panel meetings useful and collaborative. Although several FY 1997 FIS reviewers said the orientation meeting and logistics were confusing, almost all found their individual panel discussions enlightening. As discussed earlier, however, a minority was concerned that other panel members did not have the requisite skills and that nonresearchers were too impressionable. OERI staff were described favorably by most reviewers.

Applicants were generally positive about the application packages and the evaluation criteria. Some indicated that the invitational priorities played a role in their decision to apply, but others said they did not consider those priorities. Some took issue with page restrictions or other constraints that did not have to do with content.

Applicants were very mixed with regard to the quality of the review comments. Not only were unsuccessful applicants less than positive, but some of the successful applicants did not find the review comments useful. Among the issues noted were very short or cryptic comments, comments that provided inaccurate information, or comments that did not appear to fit with scores or rankings.


1 We did not examine one postsecondary FIS panel because one of the authors of this report was a consultant on an application reviewed by that panel.
2 An OERI staff member has pointed out that reviewer credentials may not match applications when the applications are not immediately related to the institute's focus. This is undoubtedly the case in some instances, but because we were not seeking a perfect fit between reviewers and applications, this problem was not a major issue for our review.
3 Our sampling plan called for sampling reviewer comments on each application, but institutes often supplied us with all three reviews for an FIS application, so we read all three. For the five center competitions, we read all reviews for sampled applications in one of the competitions, but only the reviews of sampled reviewers in the other four.
4 We assume it was optional because many reviewers did not fill it out.
5 It should be noted that in some cases, applicants did not remember the reviews well enough to comment or could not tell from reviewers' comments whether the reviewer demonstrated expertise.
6 Four applicants did not comment.
7 We did not ask staff to identify problematic FIS panels, and we did not examine one of the successful FIS panels because a member of the review team was an applicant to that panel.

-###-



[ Description of the Competitions ]
[ Descriptions of the Competitions ]
[ Table of Contents ]  [ Study Conclusions and Recommendations ]
[ Study Conclusions and Recommendations ]