On promoting rigour in educational research: the example of the RAE?[1]

(BERA Paper, Warwick, September 2006)

Ian Stronach

'One shouldn't complicate things for the pleasure of complicating, but one should also never simplify or pretend to be sure of such simplicity where there is none. If things were simple, word would have gotten round, as you say in English.' (Derrida, 1988:119)

One might argue that the 'simple' word did get round, in relation to the UK RAE excellence criteria:

4* Quality that is world-leading in terms of originality, significance and rigour.

3* Quality that is internationally excellent in terms of originality, significance and rigour but which nonetheless falls short of the highest standards of excellence

2* Quality that is recognised internationally in terms of originality, significance and rigour.

1* Quality that is recognised nationally in terms of originality, significance and rigour.

U/C Quality that falls below the standard of nationally recognised work. Or work which does not meet the published definition of research for the purposes of this assessment.

The purpose of this paper is to deconstruct the categories, criteria and rubrics of the RAE particularly as they apply to Panel K, sub-panel UoA 45, Education. It is hoped that such a critical consideration can help participants of all shades enter a debate that will inform decision-making and critique. It is also a hope – though less likely - that such reconsiderations can help inform future policy development in the area of research performance appraisal more generally. It should be stressed, for the benefit of those unfamiliar with 'deconstruction', that such an undertaking seeks out the unspoken, the implicit and the contradictory in these rubrics. In so doing it aims for a 'provocative' validity, and the performance of alternative interpretations. These do not constitute a new bedrock on which appraisal can confidently be founded. Instead, they unsettle conventional readings in ways which open up new understandings and richer appreciation of the dilemmas of this sort of appraisal, in terms of its ambitions to 'correspondence', 'consensus' or – as I will eventually argue – 'translation', drawing on Benjamin and Derrida.

To begin at the beginning, the criteria seem to have emerged more from a political than an academic process. In identifying criteria that promised to distinguish 'world-leading' research (4*) from 'internationally excellent' (3*) and work 'recognised internationally' (2*), they raised the issue of 'internationality' in relation to research. They did so in regard to three categories of evaluation – 'originality, significance and rigour'. These were the givens of the RAE process.

They reflected government perceptions that previous RAE appraisals had been subject to credentialist inflation, and so replaced the old grades (1,2,3b,3a 4,5,5*)[2]. In specifying three grades of internationality, the government met grade inflation with criterial hyperinflation.

First of all, there are a number of technical and logical points that need to be made. The criteria were incompetent as such in that the descriptors gave no indication as to how they might be measured or defined. Subsequent attempts by Panel K to offer 'expanded definitions of the quality levels' ( merely offered further undefined attributes: ' highly significant contribution'; 'significant contribution'; 'recognised contribution', and the loser's 1* of 'limited contribution'. The 'expansion' amounted to the displacement of one unspecified performance vocabulary with another. At the same time the tautological relation is obvious: what's world-leading? Answer: whatever's highly significant. What's highly significant? Whatever's world-leading. The 'expansion' of the quality levels, in short, is 'gaseous' in its rhetorical nature (Novoa & Yariv-Masha 2003). Finally, the envisaged system was normed in relation to an absolute rather than a distribution: 'The definition of each quality level relies on a conception of quality (world leading quality) which is the absolute standard of quality in each unit of assessment' (RAE 01/2005), my emphases). So quality 'relies on' quality and 'is' an absolute in and of itself, shortly before disappearing up its own benchmark. Indeed, this is the 'tautology of redundant specification' (Chambers 2003: 181).

Second, there is an oddity about the categories of excellence, 'originality, significance and rigour'. The first two sound like maximum competencies, but the third suggests a minimum competence which makes peculiar sense if one tries to align it with the evaluation criteria: what would 'world-leading rigour' look like? Would it be a good thing, or would it imply the rigidity of a rigor mortis?

Third, research quality was to be appraised in terms of 'originality, significance and rigour'. This was the generic phrase across the generic RAE documentation and main panel rhetorics[3]. It is noteworthy that Panel K is unusual in that it sometimes reverses this order and refers to 'rigour, significance and originality' (p19, 32, 46). The Education sub-panel followed suit (UoA draft, p5; Education sub-panel document p23). There is no such reversal in related subjects (eg UoA 41, Sociology). The Education sub-panel also spells out its take on 'rigour' quite closely in offering guidance on how each piece of research might present itself to the panel – 'in particular how the criterion of rigour is met', along with 'methodological robustness' and 'systematic approach'. Without being over-suspicious, such words as 'systematic' 'robust' and 'rigour' belong most readily to neopositivist forms of research narrative that many have criticised as 'scientism' rather than Science (Erickson & Gutierrez 2002). One might wonder if the panel is expressing, or being required to express, a coded predilection hereabouts. The sub-panel is clear elsewhere, however, that there have to be methodological horses for courses even if it offers this reassurance in a subordinate clause:

'..but rigour can best be assessed on a case by case basis using whichever dimensions are most appropriate'

Other, more science-related panels are less concerned with 'rigour'. Engineering does not mention the term and spells out its priorities in terms of advancing knowledge and understanding, originality and innovation, impact on theory, methodology, policy, practice etc. Physics expects to see 'some of the following': 'agenda-setting, research that is leading or at the forefront of the research area, great novelty in developing new thinking, new techniques or novel results ..' (p42). Again, no mention of rigour. Linguistics on the other hand, takes that term and disseminates it across a whole range of research qualities: 'intellectual coherence, methodological precision and analytical power; accuracy and depth of scholarship; evidence of awareness of and appropriate engagement with other work in the field or sub-field'. So the notion of 'rigour' can either be dropped or exploded, and by disciplines more readily called 'Sciences' than educational research or indeed Psychology.

Whence the Education sub-panel's additional concern for 'rigour', and the special emphasis expressed in Panel K? Is this the semi-sciences reflecting the same obsessive concern for respectability as the semi-professions? Put provocatively, is 'Science' mobilised as Rigour within the RAE sub-panel process in order to make an honest woman of Educational Research?

Thus far, in relation to 'rigour', we have some straws in the wind – no more – a shift in word order, a foregrounding in terms of specification, a hypothetical bias towards particular 'scientific' approaches to social inquiry – a possibility noted by Armstrong & Goodyear (2005)[4]. The rest of this paper seeks to say more about the location and import of this notion of 'rigour' in the hope that its opening-up will lead to more clarity for those who orchestrate, execute or endure the RAE process.

There is a research basis for the RAE categories of appraisal. Wooding and Grant (2003) conducted research on behalf of HEFCE across a broad range of disciplines in order to establish the key categories against which the research community would wish to see their work appraised. They did so via a number of workshops spread across the country. It is reasonable to assume that their work is sufficiently representative across most disciplines concerned in the RAE. In their Executive Summary they reported that there was a consensus that high quality research was based on 'rigour; international recognition; originality; and the idea that the best research sets the agenda for new fields of investigation' (p3, my emphasis). So there we have a vindication of Panel K's and the Education sub-panel's re-prioritisations – 'rigour' is the paramount concern.

But when we examine the full research text of which the Executive Summary claims to be an accurate summary, we find the following statement:

'The concept of defining the research agenda by framing new research questions and advancing a field into new areas was seen as the most important characteristic of high quality research' (p14, my emphasis).

The enumerations the authors give in Fig 11 confirm this. The priorities are reported in terms of respondents' views as (1) defining the research agenda (99 attributions), (2) rigour (71 attributions), (3) international recognition (66 attributions). So rigour is not the leading requirement, as their executive summary claims. Struck by this slip, I thought that an examination of the research annexes might be interesting. 'Defining the research agenda' (DRA) had been one of the HEFCE researchers' emergent categories, and items like 'advancement of the field', 'potential to move discipline forward' had typically been included. But an item 'advance body of knowledge' was attributed to a separate category called 'scholarship', which seemed indefensible since it fitted perfectly into DRA – 'advancement of the field' and 'advance body of knowledge' are synonymic. Again, the category of 'rigour' had been awarded the response item 'depth' – more reasonably attributable to the notion of 'originality'. Finally, there was a remarkable accident to one of the workshop's appraisal of the importance of 'originality'. Although originality had been highly rated by other workshops, there was a zero score for one workshop. It seemed unlikely that a group of academics would be so uninterested in 'originality'. Further exploration revealed an explanatory footnote:

'The sticky hexagon fell off the chart and was not available for voting' (p5).

A case of the 'hanging chad', as a colleague remarked. An averaging of the other workshop scores for that category (which would have been 7.75) was not undertaken, and instead the presumption of a 0 score was made. Taking these sorts of anomalies into account a re-analysis of the actual workshop data would read as follows:

Defining the research agenda – 120

Originality – 69

International recognition – 66

Rigour – 62

So the category of rigour was 1^st in the Executive Summary, 2^nd in the main text, and 4^th in relation to the data in Annex II. 'Rigour' was a shifting if not shifty signifier, whose analysis had certainly not been rigorous. Now, it is almost always a good idea in relation to UK governance to adjudicate cock-up/conspiracy theories in favour of the former, but it does seem possible that 'rigour' might be experiencing some unwarranted promotion.

Given the Education sub-panel's unusual concern for 'rigour' it is worth raising some questions that the panel might want to consider and even address publicly, in a spirit of reassurance. They are: is there a particular promotion of the notion of 'rigour'? Is that promotion, if such it is, part of paradigmatic prejudice? Will 'rigour' as a minimum competence be used to police the other two categories?[5] Are covertly hegemonic moves being made in relation to RAE documentation?

In relation to all of the above questions, it is relevant to note that the panel has detailed requirements for members to declare interests. All of these are 'familial' in nature – same institution, former student, collaborator etc: none concerns paradigmatic prejudice, and yet as a BERJ editor of 10 years standing, I know that such bias is far more likely than any other. So why that gap? This is a particularly acute omission when one considers how explicitly other panels have marked out excellence as a matter internal to specific fields and concerns:

'In view of the diverse nature of the discipline of sociology, the sub-panel understands the quality descriptors to relate to indicators within fields, sub-fields and cognate areas' (UoA 41, p47).

Philosophy too invokes immanent criteria: '..will judge submissions against the best work in the field' (p40)

To return to the specifics of the Education sub-panel's guidelines, they are unusual in that they offer a very specific example of what might count as a justifying rhetoric for any single publication (150 words are permitted in order to provide evidence of the claim to originality, significance and rigour). It reads as follows:

'Hypothetical Example:

Humanities in primary schools – short booklet containing advice to teachers and policymakers based on a synthesis of international research. This provides an innovative conceptualisation of the field and has been referred to by the TDA for Schools (2007) as the basis for its criteria for CPD in this area. The review considered 1250 references of which 41 met the criteria for inclusion. The full review has been accepted by Springer in 2008. The distillation of the implications of the literature was done by a working group of five researchers and five teachers. The draft was refereed by two international referees and piloted by ten teachers and four policymakers to ensure it was appropriate and user-friendly. It is cited by two researchers because of the considerable work involved in its production; they contributed equally and co-directed the project, financed by a £15,000 grant from Tesco.' (p32)

There are a number of points to be made here. First, this is the only example. In the draft, there were five such examples (UoA 45, draft as at 16.7.05, p4-5), and one wonders why that plurality was narrowed down, making the example much more likely to be read as an exemplar or even as a template. Second, the earlier draft had taken a very different tack – emphasizing contributions to theory or methodology, prestigious keynotes, international awards and so on (ibid., p4). Would it be unreasonable to envisage the heavy hand of the State intervening in this peer-led process of consultation, promoting itself and its agencies as the User/Methodologist Who Mattered? Third, the narrative is almost exclusively focussed on local and national users, and on outlining the detail of a research process in accordance with one particular notion of rigour. The evidence for originality is very limited (a claim of 'innovative conceptualisation'), and the significance is clearly circumscribed by the national. So that's a clear 1*? Yet, it would be a foolish reader who thought that it was included to illustrate the bottom end of the range. It is also hard not to take the methodological orientation of that narrative and associate it with the foregrounding of a particular approach to educational research with its characteristic notions of rigour, robustness, and systematicity - the features we earlier wondered about in relation to panel predilections. If we then ask ourselves 'what kind of writing is this?' then there is a simple and clear answer. This is the kind of 'structured abstract' that Sebba has been calling for since 1998, and which is currently based on an extension of the work of Hartley (2003). It performs and validates a kind of neopositivism that has been quick to dismiss almost all other approaches to educational inquiry as invalid and above all, lacking in rigour[6]. As in the fictional abstract, they fail to meet 'the criteria for inclusion'. And it is 'rigour' that demarcates that exclusion.

To conclude, the categories and criteria for RAE appraisal in educational research are far from clear. There seems, prima facie, to be cause for serious concern in relation to the promotion of certain kinds of research and certain kinds of outcome. No doubt, the panel will argue that they are their own masters and will come to their own judgements- and the element of judicious peer review and consensus will of course be present. Indeed, three of the Education sub-panel members said so when this paper was presented at BERA, September 2006. Provided they make full disclosure of paradigmatic prejudices, no one will doubt the panel's integrity. But the discourses around which they construct their deliberations also surround them. The extent to which they will be prisoners, warders or governors remains to be seen: escapees they will not be.

This brief account began with the claim that the orienting metaphors of appraisal depended on notions of 'correspondence' and 'consensus'. In the shortest of shorthands these represent different epistemologies – one could plausibly toss Hammersley into the first box and Habermas into the second, albeit not with the same reverence. But both projects suppress a kind of impossibility. All translations, to move to our third metaphor of evaluation, express what Derrida has called 'an economy of in-betweenness' (2001: 179). In discussing The Merchant of Venice, Derrida argues that the economy of translation (ducats/pound of flesh; christian/jew) circles through both a 'proper meaning' and a 'calculable quantity' (ibid). Letter and spirit are irreconcilable yet inevitable and necessary to each other:

'This relation of the letter to the spirit, of the body of literalness to the ideal interiority of sense is also the site of the passage of translation, of this conversion that is called translation' (p184).

The RAE process is just such a plural act of conversion/translation. Quite explicitly, it re-expresses the local as a globally normative value, one that invites international appraisal, investment and emulation. That is one version of the fantastic 'passage' of translation. It is part of the global phenomenon of 'comparison, defining a new mode of governance' (Novoa & Yariv-Mashal; 2003: 428; Stronach 1999). It involves the incommensurable translation of a 'field' into a 'harvest', a transubstantiation that is no less mystical than that of 'bread' or 'wine' in Christian ritual, to return to Benjamin's analogy for the notion of 'translation'. In addition, it effects the parallel construction of the 'self-auditing academic' (McWilliam 2004: 162), and with it another 'passage', this time of identity rather than status or nature. Of course one could go on, but already it can be seen that the singular body of a spirit of inquiry is thus translated into sundry commodities of appraisal – individual, institutional, disciplinary, national - and various 'pounds of flesh' are weighed and translated in terms of an 'impossible but incessantly alleged correspondence' (Derrida 2001: 184)[7]. These require certain kinds of passage which cannot be legislated for, any more than law can guarantee justice in any single case. Justice, like mercy, is above the law, and can have no a priori regulatory transparency, any more than can virtues such as tact. The future RAE moves to 'metrics' will inevitably make even more injudicious such translations, by obscuring the 'objectified subjectivity' of the exercise (Velody, cited in Bence 2005: 151; see also von Tunzelman & Mbula 2003: 15). The translation of our pounds of flesh into RAE ducats is inherently incommensurable, and that is the scandal of assessment which the RAE and even more its future metrification seek to suppress[8]. If such appraisal has an essence it is the absence of all possible rigour.

'If the problem, the aporia, in any of these cases is resolved not through experience, through the "ordeal of the undecidable", but through recourse only to calculation or a formula – which is always somebody's formula – then it will have been a sell-out, a set-up by and for one economy, "fixed" from the start' (Derrida, cited in Davis 2001: 95; see also Derrida 2005, Richter 2002).

Derrida ends his 'translation' argument concerning Shylock and The Merchant of Venice by arguing that in the end it is mercy ('It droppeth as the gentle rain from heaven') rather than justice for which we ought to pray. Maybe that's right.

'Though justice be thy plea, consider this,

That in the course of justice none of us

Should see salvation: we do pray for mercy

And that same prayer doth teach us all to render

The deeds of mercy.' (The Merchant of Venice, Act 4, Scene 1)

..................................

[1] Thanks for critical comment to Jo Frankham and Harry Torrance.

[2] It may seem strange that a simple numerical scale should become so contaminated with letters and stars, but it's important also to read these scales as class markers, in the way that socio-economic classes are distinguished in the UK. To joke only a little, 3b/3a might be seen as corresponding to a distinction between 'rough' and 'respectable' working class. The economic disenfranchisement of 1/2/3b/3a in 2001 thereby enhanced the loading of funding at the top end and widened the gap between 'have' and 'have-not' in ways that might also be regarded as analogous, particularly as these cuts were conducted under the overall policy banner of 'capacity building'. There are underlying parallels, it might be argued, with New Labour inclusionary rhetorics.

[3] There is a longstanding history of 'significance' and 'originality' in relation to questions of 'recognition' in Science and its Sociology (Merton 1957). The priority of 'originality' is clear.

[4] Armstrong and Goodyear argue: 'It would not do for an assessment model to be dominated by advocates of large-scale, randomised control group experiments (or poststructuralist policy critique, for that matter)' (2005: 21). Indeed. (See also Stronach 2005).

[5] A natural riposte to such questions is to dismiss such possibilities as a slur on the integrity of reviewers. But our recent and highly analogous experience with an ESRC end-of-report review shows that such paradigm warfare does happen. The report and case study (Piper et al) was reviewed. One reviewer said good things about the research. First, 'a qualitative approach is justified'. But 'with some reluctance I have rated this report "problematic" rather than "good". Why? The reviewer's 'fairly strong reservations derive from my prejudices against qualitative research'. The other 3 reviewers rated the research 'outstanding', which was its overall grade. The example illustrates both the possibilities of bias and its correction. It highlights the need for careful moderation across reviewers and the need to take paradigmatic bias into account.

[6] Or perhaps we're back to 'rigor' as in 'rigor mortis'. MacLure has criticised the fruits of such systematic reviews as illustrated in the Hypothetical Example as 'tiny dead bodies of knowledge' (MacLure 2005: ).

[7] Benjamin differentiates between the 'intended object' and the 'mode of intention' in making this point in relation to the meaning of 'bread' and 'wine' in different languages (Benjamin 1973: 74). Others might express a similar point in relation to connotative and denotative meanings. The RAE's move from overall institutional assessment to individual item appraisal has its parallels in the practices of risk management: 'The so called asset-by-asset approach dominates the scene over the portfolio-theoretical approach' (Kalthoff 2005: 75).

[8] In discussion at the BERA presentation of an earlier version of this paper, I argued that the RAE was like the 'pool's panel' – which meets in the UK to decide the result of postponed football matches so that gambling results are available. But that was the 2001 RAE. There is a difference in the 2008 RAE exercise. This time the panel's wisdom is far greater: no longer content to decide the outcome of a match that was never played, it also decides how well each player performed.