V. Study Conclusions and Recommendations
As noted in chapter 1, the charge of this study was to examine: (1) whether the Office of Educational Research and Improvement (OERI) peer review standards are appropriate and useful, (2) whether they contribute to fair and high-quality competitions, and (3) how the competitions conducted under the standards have operated and how they might be improved. This chapter begins by examining the areas in which the standards are appropriate and useful, and as such contribute to fair and high-quality competitions. It then reports findings on how the competitions have operated under the standards and provides recommendations in eight key areas for improving both the competitions and the standards in eight key areas.
Appropriateness and Usefulness of the Standards
The important elements of the OERI standards (Federal Register, September 14, 1995: 47808-47813) include the qualifications of peer reviewers who review applications for grants and cooperative agreements, the rules for conducting the competitions, and the process for ranking and selecting applications (see chapter 2).
With regard to the qualifications of peer reviewers, the panel finds that the standards specifying the individual qualifications of peer reviewers are appropriate and useful because those standards ensure that all reviewers have both research and policy or practice experience. The qualifications specified include: demonstrated expertise, including training and experience, in the subject area of the competitions; in-depth knowledge of policy or practice in education; and in-depth knowledge of theoretical perspectives or methodological approaches in the subject area of the competition.
With regard to the rules for conducting competitions, the panel finds many of the current provisions of the standards appropriate and useful because they standardize procedures across the institutes and make the procedures explicit. These provisions include the following: rules that specify when peer review is to be used (for the review and evaluation of all applications for grants and cooperative agreements that exceed $100,000); the criterion of at least three reviewers for grant or cooperative agreement awards above $50,000; the requirement that reviewers "must be given a number of applications to evaluate;" instructions to "independently evaluate" and rate (i.e., score) each application, and to base evaluations and ratings on evaluation criteria and weights assigned to each criterion; the provision that reviewers must accompany their evaluations with "concise written comments based on the reviewer's analysis of the strengths and weaknesses of the application with respect to each of the applicable evaluation criteria;" and the requirement that after the independent evaluation/rating, reviewers who evaluated a "common set" of applications are to convene and discuss the strengths and weaknesses of those applications, and "each reviewer may then independently reevaluate and re-rate an application with appropriate changes made to the written comments."
With regard to the steps that must be taken before awards are made, the panel supports the latitude given to the Secretary with respect to making awards after the peer review process has been completed. The standards allow the Secretary to consider a wide array of information in making awards, including "an applicant's ranking; recommendations of the peer reviewers with regard to funding or not funding; information concerning an applicant's performance and use of funds under a previous federal award; amount of funds available for the competition; and any other information relevant to a priority or other statutory or regulatory requirement applicable to the selection of applications for new awards."
Findings and Recommendations
There are a number of areas in which the peer review process might be improved; in many cases this improvement would not require a change in the standards, but in the way they are implemented. This section presents the study findings and recommendations in eight key areas: (1) enhancing the match between applications and review expertise; (2) reducing reviewer workload; (3) bolstering the professional development of reviewers and applicants; (4) clarifying the standards; (5) modifying several of the review criteria and weightings; (6) eliminating standardization of scores; (7) providing better feedback to applicants; and (8) exploring the use of technology. The goal of the recommendations presented below is to improve the quality of the peer review process in such a way that scientifically appropriate research is supported and the public interest is protected in the selection of grantees, while at the same time, reviewers have an opportunity to learn from the expertise of their peers on the review panels.
1. Enhancing the Match Between Applications and Reviewer Expertise
Findings
According to a paper prepared by the U.S. Department of Education (ED) staff (Karp et al. 1995), there are a number of constraints on OERI's ability to obtain adequate numbers of good reviewers, including reviewer unavailability (e.g., too busy), conflict of interest, and unwillingness to serve (e.g., takes too much time, too little remuneration, no professional advantages). Further constraints are introduced by the need to ensure ethnic, racial, gender, and geographic diversity; the difficulty of finding experts in specialty areas; the current once-a-year submission dates, which result in competition for reviewers among institutes; and a lack of adequate funds for reviewer costs. OERI is permitted by law to spend only up to 1 percent of program funds (by program account) for reviews.1 Funds cannot be moved from one program account to another or from salary and expense accounts to program accounts to pay for reviewers. Finally, OERI staff have had far less time than staff in other federal agencies to locate reviewers or constitute review panels, in the FY 1997 field-initiated studies (FIS) grant competition, staff had only 3 weeks to find reviewers. At the National Science Foundation (NSF), in contrast, staff generally have 6 weeks to select reviewers for a given competition.
Analysis of the fit between reviewer backgrounds and competition subject areas indicates that some of the FIS panels included one or no reviewers with research expertise. Moreover, in our analysis we found a relationship between panel credentials and review quality. The three panels that appeared to have no reviewers with research training and experience in the subject area of the competition produced no good reviews according to our criteria; the majority were indifferent or poor. Conversely, for the three panels identified as having strong researchers in the field of the competition, we found that the majority of the reviews were good, and none were poor. This was a far greater proportion of good reviews than was found for the other panels. This finding may be a function not only of qualified reviewers, but also of the give and take that occurs in panel discussion, which further enhances review quality. In addition to our analysis, comments from applicants and reviewers indicate concern about the lack of research methodology and design expertise among reviewers.
Although the evaluations are to be accompanied by "concise written comments based on the reviewer's analysis of the strengths and weaknesses of the application with respect to each of the applicable evaluation criteria," many of the reviews were found to be cursory and descriptive. The reviewers had not provided analytic comments drawing on their background knowledge and expertise.
Recommendations
1-1. All reviewers should meet the requirements set forth by the standards. The importance of reviewers' research knowledge and background is directly acknowledged in the third reviewer qualification cited earlier"in-depth knowledge of theoretical perspectives or methodological approaches in the subject area of the competition." The standards appear to be an effort to raise the bar, ensuring a high-quality group of peer reviewers who will be able to lend expertise, insight, and stature to OERI's peer review efforts. By requiring that all members of a panel meet all three-reviewer qualifications, the standards clearly indicate that creating a panel is not to be a "mix-and-match" effort, with one reviewer representing the research community, one practitioners, and one methods or theory.
There is little doubt that it takes considerable effort to find individuals who meet all three qualifications in the standards. Moreover, some high-quality reviewers might be excluded from the process because they meet some but not all of the criteria. If OERI decided to change the standards so that each reviewer need not meet all three criteria, a two-tier process might be put in place. Such a process would entail the review of applications for their technical merit by reviewers with the requisite expertise during the first-tier review. Applications that were technically sound would go forward to a second panel that would rate the applications on additional criteria. The National Institutes of Health (NIH) uses a two-tier approach in which the first panel judges applications on technical merit, while the second considers both technical merit (as judged by the first-tier review) and the relevance of the proposed study to the institute's programs and priorities.
1-2. OERI should consider establishing standing peer review panels within each institute. Each institute would have a standing panel comprised of 25 to 30 individuals having the qualifications set forth in the standards. To comply with the standards, it would be important to include practitioners and policymakers with a solid understanding of policy and practice. Examples of practitioners or policymakers with research expertise are directors of assessment within state education departments and principals of laboratory schools who conduct research. Examples of researchers with knowledge of policy or practice are policy researchers, as well as researchers who work collaboratively with practitioners and policymakers.
Rules for establishing these standing panels would need to be developed, building on the current rules for ad hoc panels. These rules might require additional regulations.2 Included in these rules would be the process by which panelists are selected; the length of appointments (e.g., staggered terms of specific duration); and diversity criteria (e.g., ethnic background, gender, geographic location, urbanicity) that are not identified in the OERI standards, but are applied by other agencies. To ensure both high quality and prestige, standing panelists might be appointed by the Assistant Secretary for OERI in consultation with the National Educational Research Policy and Priorities Board (NERPPB).
There are several important potential benefits of using a standing panel to review grant and cooperative agreement applications. First, doing so would increase the likelihood of having highly qualified reviewers for the competitions, including reviewers from underrepresented groups, because standing panels would afford greater professional satisfaction and prestige than is offered by the present ad hoc system. Furthermore, because standing panelists would know well in advance that they would be serving on specified dates, their availability to review applications would be ensured. Moreover, when standing panels made recommendations to an applicant about how to improve an application, there would be a good likelihood that many of the same reviewers would read a resubmission. (Multiple submission dates, which would also facilitate resubmission, are discussed below.)
In addition, if standing panels brought together "the best and the brightest," many people would be willing to serve because they would find the experience both educational and professionally satisfying. Providing panelists with letters of commendation from the Secretary of Education and remuneration commensurate with the task would offer further incentives, although increasing the amount of reviewer compensation might require a change in legislation (because, as noted above, reviewer costs may currently total no more than one percent of the program budget).3 However, in our review of other agencies, we found remuneration to be at a level similar to that in the U. S. Department of Education.
One potential drawback of a standing panel is that it could be difficult to match the professional expertise of panel members with applications covering diverse topics and employing different qualitative and quantitative methodologies. This problem is particularly relevant because the FIS programs, especially in some institutes, attract such a diversity of applications. The problem might be solved by employing multiple subpanels with different areas of expertise or recruiting additional ad hoc reviewers should specific expertise be needed. At NIH, for example, the Division of Research Grants maintains a cadre of experts who are enlisted to review individual applications when additional expertise is needed. Standing panel members or staff initiate the request for these additional reviews, whose results are presented at the panel meeting.
Another potential problem with the use of standing panels for OERI is that the FIS competitions in some institutes receive 200 applications per year. If standing panels were to meet on an annual basis, reviewers could find themselves with too many applications to read at one time. There are several ways, however, to reduce the number of applications any given reviewer has to read. These include providing detailed evaluations for competitive applications only; using preliminary applications; assigning primary, secondary, and tertiary reviewers to applications; and decreasing the number of submission dates and the number of times panels meet each year to two or three. All of these options are discussed in a subsequent section.
It might also be difficult to recruit standing panelists to serve for several consecutive years given that during that time, they might not be able to apply for funds from the institute for which they were serving as reviewer. Procedures might be put in place so that panelists could submit applications in the general content area of the institute for which they were serving as reviewer.4 At NIH, for example, applications of standing panel members are reviewed by an ad hoc panel convened for the purpose. If a two-tier review process were used, panelists who had submitted applications might be able to recuse themselves from the second-tier review if their applications were being considered. They could be replaced by a member of their three-member panel from the first tier. In the case of center reviews, those who had submitted an application would not participate on the review panel.
1-3. The quality of the research design should be rated only by reviewers with appropriate technical expertise. To enhance the quality of the reviews, additional reviewers with expertise in the research design associated with specific applications could evaluate the applications on this criterion. This approach might require the use of ad hoc reviewers with specialized expertise. A two-tier approach, as previously described, employing first-tier reviewers with the requisite technical expertise, would also support this recommendation.
1-4. The size of review panels should be increased. It is important to have a panel of sufficient size, along with a concomitant increase in the number of applications ready by a given panel, so that there will be a high probability of the panel's reviewing applications that range in quality (from those likely to be successful to those likely to fail), sufficient breadth of perspective and expertise on the panel to ensure that innovative applications will not be overlooked because reviewers fail to understand them, and an opportunity for individual panelists to calibrate their reviews/ratings against applications they have not read quite so carefully before formulating their independent ratings. The NSF accomplishes this by dividing a comparable number of applications among a smaller number of larger panels; a comparable number of applications is assigned for written reviews to each panel member, but all panelists have the opportunity to examine all applications to be reviewed by the panel. A two-tier process might accomplish some of the same objectives, but would require that first-tier panel summaries as well as ratings be provided to second-tier reviewers to give them the perspective afforded by access to additional applications.
1-5. The database of reviewers should be improved. Because many OERI staff are not researchers in the fields in which they manage research, it would be valuable for them to undertake reviews of bibliographies in those fields, if they do not already do so, in order to locate reviewers with the qualifications specified by the standards. Staff should also systematically solicit names from professional association representatives, grantees, and panelists on ad hoc or standing review panels (if such standing panels are created as recommended above). Many other agencies maintain a central database of reviewers that staff can access. If standing panels are formed within each institute, a database of reviewer names will be useful for selecting additional specialized reviewers.
The identification of standing panelists will require a more elaborate selection process. In all cases, staff might systematically evaluate the performance of reviewers and continue to use only those reviewers that demonstrated satisfactory performance, as evidenced by mature judgment, balanced perspective, objectivity, ability to work in a group, reliability, and integrity, as well as the preparation of adequate review comments. Reviewers who cancel at the last minute, come to peer review meetings unprepared, and write minimal comments have a negative impact on the review process and must be disqualified from participating in future reviews.
1-6. OERI staff should attempt to issue grant announcements earlier in the fiscal year, thereby increasing the amount of time available for selecting and assigning reviewers. As noted earlier, staff at other agencies have more time than OERI staff to select reviewers. If standing panels were used, staff would not have to select reviewers, but would have to assign applications to the appropriate reviewers (at least three per application) and, if appropriate, assign lead and secondary reviewers. Additional time would also give staff a chance to assess the content of the applications prior to finalizing review teams so as to optimally match reviewer expertise to applications topics, as well as to select ad hoc reviewers if necessary.
1-7. There should be well-established submission dates, staggered by institute. In some previous competitions, reviewers have not had enough time to review applications because there has been such a small time window between the submission of applications and the conduct of the review. Well-established submission dates for each institute, with the dates for the different institutes staggered over a 3- to 4-month period, would ensure the submission of applications in time for staff to select reviewers carefully and constitute appropriate review panels. In addition, the use of staggered submission dates would make it possible for reviewers to read applications for more than one institute. In making grant announcements, staff could estimate the amount of funding available for a competition, using the previous year's funding as a floor.
2. Reducing Reviewer Workload
Findings
As mentioned previously, many of the reviews were found to be cursory and descriptive rather than comprehensive and analytic. According to reviewers and ED staff, this is due in part to the short time available for preparing reviews, and in part to a large volume of materials reviewers have to review.
Recommendations
2-1. Logistical and other support for reviewers should be increased. Reviews could be improved by giving reviewers more time for their reviews, both prior to on-site meetings (as discussed above) and during the panel sessions. Making computers and disks available during the panel sessions for all reviewers who wanted them would also help reviewers provide more coherent and elaborate comments (see the discussion of the use of technology later in this chapter).5
2-2. Applications that are non-research should be disqualified prior to peer review. The definition of research is very broad, and might be difficult to narrow, given congressional support for a broad definition.6 Nonetheless, 5 or 6 of the 29 FY 1997 FIS applications reviewed were not really research studies. Most of these applications were from practitioners who were seeking funds to develop and implement a program, curriculum, or software. For example, one applicant proposed implementing a variety of afterschool activities for students, while another proposed constructing an interactive Web site.
Staff should work with a common definition of research. They should review applications to identify those that do not fit the definition. Providing staff with a screening checklist might help weed out non-research applications. If the staff decide to submit such applications for peer review anyway (for fear of a challenge), they should give reviewers more explicit assistance in responding to these applications. Reviewers should be encouraged to explain in their reviews how the applicant could revise the application to focus it on research (if that appears feasible) or improve its development or implementation proposal (or both).
2-3. Detailed evaluations should be provided for competitive applications only. There are currently no procedures for screening applications for merit prior to the reviewers' full evaluation. Reviewers might initially screen each application submitted under a particular competition as likely to be "competitive" or "noncompetitive." Only applications rated as likely to be competitive by a majority of reviewers would then receive a detailed evaluation. For those deemed noncompetitive, staff could prepare a summary of the discussion and send it to the applicant, with extended suggestions for improvement. An advantage of this procedure is that it would increase the time available to peer reviewers for evaluating quality applications and shift more of the routine work to staff. If this process were to be used, staff would have to define cut-off scores for noncompetitive applications.
2-4. Decrease the number of full applications through the use of preliminary applications.7 A pre-application is an abbreviated grant application, which is typically judged on a subset of evaluation criteria (management is not described, for example). Pre-applications are reviewed by staff and/or peer reviewers. Those applicants whose pre-applications are judged favorably are then encouraged to submit full applications, while others are discouraged, although anyone is free to submit.8 Many agencies use such preliminary applications, including ED's Fund for the Improvement of Postsecondary Education (where peer reviewers read pre-applications) and NSF (where they are read by program staff and/or peer reviewers). Use of pre-applications would reduce the number of full applications received; it would also allow staff to estimate the number of full applications to be received, and thus the types and numbers of peer reviewers needed. Drawbacks to the use of pre-applications include reduced information available to reviewers for forming their initial judgments and possibly lengthening of the review process, which would now include two stages of application review.
2-5. The number of pages permitted for center applications should be decreased and attachments limited. This would reduce what has proven to be a heavy reviewer workload. At present, some center applications are hundreds of pages long.
2-6. Provide planning grants for center competitions. Applicants for center funds would submit a preliminary application. Page limits would be specified. The focus would be on a conceptual framework rather than on individual studies. Selected applicants would then be provided planning grants to prepare full applications that would provide more detail on the individual studies to be undertaken, as well as on other criteria, such as the management plan. As at NSF, a panel would be convened to rate applications at each stage of the review.
2-7. Primary, secondary, and tertiary reviewers should be assigned to applications.9 This type of review is characterized by larger review panels, ranging from 8 to 25 reviewers. Thus, this recommendation would apply only if the size of the review panels were increased. To decrease reviewer workload, OERI might institute a system in which different reviewers would have differing responsibilities for evaluating the same applications. At NIH, some reviewers might read applications more thoroughly and have primary responsibility for presenting findings and recommendations to the review panel. All panel members could read an application if they chose to do so, and could comment on and ask questions about that application in the discussion. A potential disadvantage of this approach is that primary reviewers might dominate the panel discussion.
2-8. Establish multiple submission dates each year. This approach is used at NIH, where study panels meet several times a year, thereby reducing the number of applications read at any given time. A disadvantage of this approach is that it would engender additional costs and be time consuming for staff. In addition, several OERI staff believe multiple submission dates might actually increase the total number of applications submitted each year because applicants would have more than one opportunity to submit an application.
3. Bolstering Professional Development
Findings
According to ED staff and reviewers, having higher-quality applications would facilitate the review process. In addition, many reviewers expressed concern that some applicants with worthy projects do not have the capability to prepare adequate applications.
Although the standards emphasize independent judgments in evaluation, comments from reviewers during our interviews, as well as their comments written on evaluation forms, indicate that this is not always the case. Instead, they revise their scores and written comments during the panel discussion of strengths and weaknesses. Thus decisions about final scores, as well as whether an application is highly recommended, recommended, or not recommended, are sometimes made collectively. Moreover, across years and types of competition, many applicants we interviewed assessed reviewer comments as not very useful or comprehensive.
There has been considerable confusion on the part of OERI staff about the appropriate qualifications of each reviewer. Some OERI staff expressed the view that at least one of the peer reviewers of each FIS applicationbut not all threeshould be a researcher in the subject area of the competition. Staff indicated that panels have been constituted by selecting one researcher with subject matter expertise, one methodologist, and one person with expertise in policy or practice in the area of the competition, which is counter to the standards as we interpret them.
Recommendations
3-1. Enhance training for applicants. OERI staff currently offer some technical assistance to interested potential applicants at regional and national professional meetings. These efforts should be increased and made more systematic. In other federal agencies, it is not uncommon for project or program officers to provide detailed advice to potential applicants. Staff of the federally funded regional laboratories, comprehensive centers, and research centers could also provide such assistance.
3-2. Provide more in-depth training and support for reviewers. Peer reviewers currently receive some training with regard to evaluation criteria and other review procedures. For example, reviewers are provided information about their responsibilities as panel members and about completing standard evaluation forms. This training is provided through the materials sent to panelists prior to their review and as part of the basic orientation at the beginning of the panel meeting.
OERI should consider providing more in-depth training to peer reviewers. Reviewers need to understand the requirements for independent assessment, scoring, revision, and assignment of applications to funding categories. They also need to understand that their comments will be the only substantive information unsuccessful applicants receive; thus those comments should help applicants understand the decisions that were made, as well as improve their future submissions. Such training might include detailed instruction in properly evaluating grant applications. Examples of exemplary reviews might also be provided to illustrate the standards to which reviewers should hold themselves in rating applications.
Another means of improving reviewers' performance is to provide them with a set of questions to guide their reviews. Given that many of the applications fall into three or four broad categories of research design (e.g., curriculum or program development with pilot test or observation; evaluation of a curriculum or intervention; and implementation study of a state or local policy using a variety of methods, such as interviews, focus groups, classroom observation, and document analysis), it may be possible to draft a set of questions specifically appropriate to these broadly used designs. For example, there are certain questions one can ask about equivalence of treatment of control and comparison groups; about the documentation of implementation; and about the relationships among intervention, instruments, and various dependent variables. These questions could be suggested as ones reviewers might ask about the applications they review.
Moreover, many reviewers who write detailed and cogent comments on design simply do not appear to concern themselves with timelines, benchmarks, or budgets, possibly because they may not be the most appropriate judges in these areas and do not know the correct questions to ask. They may focus on small details (such as missing resumes for research assistants) because they do not know what else they should be considering. This problem is most critical with respect to center reviews, where millions of dollars are at stake. OERI should provide a series of specific questions for reviewers to ask about budgets and management plans (and also some ideas about what items are off limits, such as entertainment expenses).
Providing reviewers with elaborated scoring rubrics that would serve as criteria against which to rate applications is another way to improve the reviews. Such rubrics would explain the meaning of different numerical scores (e.g., what it means to assign 29 out of 30 or 20 out of 30 points). The rubrics would differ by evaluation criterion.
3-3. Provide professional development for OERI staff to ensure that they understand the requirements of the standards. Although many OERI staff are familiar with the requirements of the standards, some are not. For those staff needing more information, such as staff new to OERI, training would be useful and appropriate, and should be provided.
4. Clarifying the Standards
Findings
In addition to the individual qualifications for peer reviewers, the standards require that the Secretary select "to the extent feasible. . .peer reviewers for each competition who represent a broad range of perspectives." This requirement is not further elaborated in the discussion of the standards, so it is not immediately apparent how it should be implemented.
With regard to conflict of interest, the standards specify that reviewers for grants and cooperative agreements are considered employees of the Department, and as such the are subject to provisions of 18 U.S.C. 208, CFR 2635.502, and the Department policies that implement these provisions. Under those rules, reviewers are considered to have a conflict of interest if they or their immediate family, a for-profit or nonprofit organization in which they serve, or any organization or person with whom they are negotiating or have an arrangement concerning prospective employment have a financial interest in the application they are reviewing. For FIS competitions, the Department asks for a waiver so reviewers can serve even if applications from their university are expected. However, no reviewer is ever assigned to read an application from his or her university or other employer.
Although reviewers do not read applications from their own university, our analysis indicated that some reviewers had had prior professional relationships with the applicants whose applications they were reviewing. While this may not constitute conflict of interest as defined by the standards, it could predispose reviewers to judge applicants by those prior relationships, rather than solely on the merit of the application. Moreover, the appearance of a conflict can constitute a serious problem.
Finally, while the text of the application package may point out that the priorities are nonbinding, it remains unclear how reviewers and others should view those priorities in determining application strengths/weaknesses, rankings, and awards. Some applicants said the priorities were critical in their decision to submit an application. Some reviewers assigned points for national significance if an application addressed these priorities.
Recommendations10
4-1. The term "multiple perspectives" in the standards should be elaborated. Gender, race/ethnicity, and geographic considerations, including rural/urban perspectives, should be taken into account when review panels are constituted. The peer reviewer selection process might be made similar to the process at NIH or the Office of Special Education and Rehabilitative Services (OSERS) within the U.S. Department of Education. At NIH, factors such as geographic distribution and minority and female status must be considered in selecting review group members (NIH, 1992:6). In OSERS, staff consider the overall representativeness of the panels convened for a competition, especially the presence of underrepresented groups, such as minority groups and persons with disabilities. Panels should also be constituted to take into account diverse disciplinary background, theoretical orientation, methodological approach, and research role.
4-2. Conflict of interest should be further defined to include professional relationships. OERI might model its procedures on those used at NSF. NSF informs reviewers that "they may not participate in the review of any proposal in which they or a member of their immediate family or an organization of which they are or may become a part has a financial interest, nor may they be in the room when such a proposal is discussed." NSF, in its instructions to reviewers, provides examples of conflict of interest that go beyond financial interest, such as: reviewer would be directly involved in the project (e.g., as a consultant or collaborator); reviewer is from the same institution as proposer; reviewer and proposer have been related recently as a student and thesis advisor or postdoctoral advisor; reviewer and proposer are known to be close friends or open antagonists; and reviewer and proposer have collaborated recently on a related project; reviewer and proposer were coauthors on a paper published in the last 4 years.
4-3. Do not list priorities for FIS competitions. Panelists interviewed supported the use of priorities for directed research competitions but not for FIS competitions. Other agencies with more funds available for research run several kinds of competitions. Research priorities are not specified for FIS grant competitions, whereas they are specified for directed research grant competitions.
5. Modifying the Review Criteria and Weightings
Findings
The standards specify broad evaluation criteria, and specific factors under each, from which OERI may select review criteria for each grant or cooperative agreement competition. The broad criteria include national significance, quality of the project design, quality and potential contributions of personnel, adequacy of resources, and quality of the management plan. The factors offer a wide range of options, with some oriented to research competitions and others more appropriate to demonstrations or program grants. The rules allow for complete discretion with respect to which of the broad categories is used in any competition, as well as which, if any, of the specific factors are selected.
Applicants and reviewers for both FIS and center competitions indicated confusion about the meaning of the national significance criterion and how it should be addressed. In our assessment of the reviews, we found that reviewers and applicants often define national significance as the importance of the problem to be addressed. They do not interpret it to include the potential contribution of the project to the development and advancement of theory and knowledge in the field in addressing an important problem.
As noted the standards allow for considerable flexibility in both the broad evaluation criteria and specific factors that are selected for each competition. Yet the broad criteria are the same for the FIS and center competitions. This is problematic because there is such a difference in scale between the FIS and center competitions. For example, the evaluation criteria for quality of project design for the center competitions are currently written as if reviewers were evaluating a single research study, as is the case with the FIS competition, rather than a series of studies, which is what most center applications propose. Partly as a consequence of this, we found that reviews of center applications rarely discuss and assess the quality of the proposed study design. Instead, they focus on the overall conceptual framework. As a second example, reviewers and ED staff commented on the need for increasing the weight given to the management criterion for centers. The management component is very important for centers because most centers are located in multiple, geographically diverse sites and involve the management of complex research activities across these sites. This is generally not the case for FIS.
Recommendations
5-1. Clarify the meaning of "national significance." The review criterion of national significance should be clarified so reviewers and applicants understand that it refers to both the importance of the problem to be addressed and the potential contribution of the project to the development and advancement of theory and knowledge in the field.
5-2. Further elaborate the project design criterion for center applicants. As noted, reviews of center applications often do not address the individual studies being proposed. In part, this may be due to a lack of clarity in the evaluation criteria about whether such analysis is needed.11
5-3. Increase the weighting for management for center applications. Different weightings for FIS and center competitions on this criterion may be warranted, given the complex activities typically undertaken by centers across sites.
6. Eliminating Standardization of Scores
Findings
With respect to standardization of scores, some grant competitions and some institutes have elected to use the Department's standardization process designed to correct for possible bias introduced by different reviewers' approaches to assigning raw scores. According to a report prepared for the Department by Advanced Computer Systems, Inc. (1992:10), the standardization process is based on a set of assumptions about the distribution of applications:
| that the varying quality of applications in the entire pool of applications is normally distributeda similar number of good, average, and poor applications is reviewed by each panel; the applications distributed to each panel are normally distributed; all panels have the same training and direction; and any resulting deviation is due to reader bias (Analysis of the Grants and Contracts Management System Score Standardization Program). |
Staff we interviewed indicated that FY 1997 FIS applications were not randomly assigned to panels within institutes, but assigned on the basis of application topic. Moreover, interviews with reviewers revealed that some were aware of the standardization process and had adjusted their scores to ensure that favorite applications would have a better chance of being funded. Other reviewers were not aware of the process, and thus had rated applications without regard to standardization. In examining applications, we found that at least five of those sampled were not research, and the scores for these applications may have skewed the distribution.
Recommendation
6-1. Do not employ standardization for FIS and center competitions. Standardization would be rendered unnecessary by the use of second-tier panels that would review and rank the applications of finalists from the first tier, as discussed earlier, or standing panels that would review at one time all applications submitted to an institute.
7. Providing Feedback to Unsuccessful Applicants
Findings
OERI staff currently send copies of the reviewers' Technical Review Forms to each unsuccessful applicant. These forms contain the applicants' raw scores, along with reviewers' descriptions of strengths and weaknesses associated with each evaluation criterion and a brief summary of the review. Many applicants reported that the reviews they received were not useful. As previously mentioned, our analysis indicated that many of the reviews were cursory.
Recommendation
7-1. OERI staff should consider providing unsuccessful applicants with more detailed feedback on their applications. Previous recommendations aimed at improving the review process should help achieve this objective. In addition, staff might provide a written summary of the panel discussion of each application, a procedure used at both NSF and NIH, where it constitutes the main feedback to applicants and incorporates written review comments.
8. Exploring the Use of Technology
Findings
Our review of the peer review process in other agencies revealed little evidence of the use of technology in peer review, although NSF uses a method of electronic filing of applications that enables applicants to track the progress of their applications. Nonetheless, both the OERI staff we interviewed and the expert panel members expressed interest in pursuing the use of technology to expedite the peer review process.
Recommendation
8-1. OERI should consider a small pilot project to determine whether and how technology could be used to support the peer review process. For example, if a two-tier review model were adopted, reviewers might read applications during the first tier and download their reviews onto a Web site devoted to the competition. After all reviewers had had the opportunity to read each other's reviews, they might conference via telephone or e-mail to resolve differences and decide which applications should move forward.
Summary of Recommendations
This section highlights the study recommendations that are most central to improving the OERI peer review process.
First, standing panels of 25 to 30 reviewers should be established in each institute. Reviewers should be carefully selected to ensure that each meets the criteria established by the standards. Panels should be constituted to ensure racial, ethnic, geographic, and gender diversity. Moreover, a balance between senior and junior scholars should be sought. Proposed panelist slates should be approved by the institute directors and the Assistant Secretary for OERI, in consultation with NERPPB.
The reviewers on these standing panels should serve set (e.g., staggered 3 year) terms and form the core of reviewers for each institute. For the center competitions, a subset of standing panelists should be used. Decisions about which panelists to select for a center competition and the number needed should be based on the applications received for a particular competition. The subpanelists could also serve as midterm reviewers, thus ensuring consistency in the review process.
For field-initiated competitions, there are two options for the review process. The first would entail the formation of six- to eight-member subpanels from the membership of the standing panel; these subpanels would provide the first tier of review. The first-tier review process would function much like the current process, except the subpanels would comprise primarily standing panelists and would be expanded from three members to six to eight members to provide a broader context for the review. Applications would be allocated to subpanels on the basis of the panelists' subject area expertise and experience. If the review of some applications required special technical expertise, the subpanels could be supplemented with ad hoc reviewers. During each review cycle, the team leaders of each subpanel would meet for an additional day for a second-tier review to rate all the top-ranked applications from the first-tier subpanels.
A second option for the FIS review process resembles the process used at NIH, where the entire panel reads all applications. At NIH it is typical for a group to review 75 to 100 applications at each meeting. Each member is asked to prepare detailed reviews for a dozen or more applications. The meetings are conducted by a chair who is a peer, assisted by a staff member. Those preparing the written reviews lead the discussion of the applications assigned to them. Each application is discussed and considered. Decisions not to recommend for further consideration are made by majority vote.12 If a member disagrees, he or she can submit a minority report, and when there are two or more dissenting members, a minority report must be drafted. Members who cannot assess the merits of a proposal can abstain from voting, although abstentions are not encouraged. Applications can also be deferred, perhaps for a site visit or to obtain additional information. Those applications not rejected or deferred are assigned a priority score by each member. These scores are averaged by the staff member after the meeting. In addition, a summary statement for each application, prepared by the staff person involved in the review for transmittal to the council and the applicant, shows a percentile ranking for the application against a reference base of all applications reviewed by the committee over three meetings, including applications not recommended for funding or deferred (helping to minimize the effects of a single meeting). The written comments of panel members and the panel discussions are the basis for these summary statements.
Standing panels might also be involved in other activities, such as recommending how OERI could help foster research in a particular area in which good applications had not been received; helping to select new panelists and ad hoc reviewers; and reviewing grant-produced products, especially once the Phase 3 standards have been put in place. Panels might also provide continuity in the assessment of applications so that rejected applications that had been revised and resubmitted would be reviewed by at least some of the same people. In addition, panelists could serve on midterm review teams for operating centers.
Given that some institutes receive up to 200 applications annually, methods for reducing reviewer workload should also be considered. Several possibilities are elaborated in this report. They include, for example, the use of preliminary reviews to reduce the number of full applications receiving a detailed evaluation, and the use of pre-applications, with only a subset of applicants being asked to prepare a full application.
This report also makes other recommendations for improving the OERI peer review process. First, professional development should be enhanced for ED staff, especially those new to the process, as well as for applicants and reviewers. Additionally, reviewers would benefit from questions to guide their reviews and from elaborated scoring rubrics. The standards would benefit from clarification in several areas: the term "multiple perspectives" should be further defined to ensure that panel membership is balanced by gender, race/ethnicity, and geographic location as well as by disciplinary background, theoretical orientation, methodological approach, and research role; conflict of interest should be defined to include professional relationships as a source of conflict; and priorities should not be listed for FIS competitions. Modifying the review criteria and weightings would also enhance the process: the meaning of "national significance" should be clarified; for center competitions, the project design criterion should be elaborated; and weighting for management should be increased. Standardization of scores should be eliminated as well; the use of second-tier panels and standing panels would make this process unnecessary. Finally, more detailed feedback should be provided to unsuccessful applicants, and the use of technology in the peer review process should be explored.
| 1 | Karp et al. (1995) provide the following example. Prior to the FY 1996 FIS competition, the program was funded at approximately $1 million per year, and OERI routinely received over 300 proposals. As a result of the legal cap on review expenditures, only $10,000 was available to pay for travel, per diem, and honoraria for reviewers to read the applications. |
| 2 | Lawyers from the Office of the General Counsel in the U.S. Department of Education noted this possibility (personal communication, September 1998). |
| 3 | If the standing panel performed a variety of roles of which peer review of applications was only one, panelists could also be employed as consultants under program or salaries and expenses authority. |
| 4 | This would require consultation with the U.S. Department of Education lawyers. |
| 5 | ED staff do not believe implementing this recommendation is feasible at the current time, although panelists we interviewed requested that such an option be explored. |
| 6 | Public Law 103-227, The Educational Research, Development, Dissemination, and Improvement Act of 1994, Section 912(l)(6) defines educational research: "The term educational research includes basic and applied research, inquiry with the purpose of applying tested knowledge gained to specific educational settings and problems, development, planning, surveys, assessments, evaluations, investigations, experiments, and demonstrations in the field of education and other fields relating to education." Section 912(l)(7) defines the term field-initiated research: "The term field-initiated research means education research in which topics and methods of study are generated by investigators, including teachers and other practitioners, not by the source of funding." |
| 7 | A change in the regulations might be required to implement this procedure. |
| 8 | The current standards for peer review do not permit pre-application. |
| 9 | This recommendation might require a change in the regulations. |
| 10 | Clearly, these recommendations would require a change in the standards. |
| 11 | Implementing this recommendation might necessitate changes in the standards. |
| 12 | This procedure is not permissible under the current standards. |
-###-
|
[ Findings ] |
|
[ Resource Material ] |