Character in the Corpus: Corpus and Virtue

What’s in a corpus?

As language is our primary means of social interaction, the way by which we codify morality and the fundamental material of literacy, linguistics offers a useful suite of tools for learning. An instructive approach by which to undertake an examination of literacy is ‘corpus linguistics’. A corpus is essentially a collection of texts and its tools and approaches have been used to explore the ways in which language is used. The corpus at the heart of ‘Character in the Corpus’ comprises the 116 prose texts set at A-level in the UK.

Virtue in the corpus

The use of corpus linguistics to examine virtue is not new. Kesebir and Kesebir (2012) looked at the Google Ngrams corpus to argue that a decline in virtue terms across time suggests their reduced salience in US life and consequently moral behaviour. From a set list of virtue terms, they noted their attenuation across the twentieth century within American texts. From this, they deduced a decrease in the salience of moral concepts and suggested a consequent decline in morality. Whilst it can be tempting to overstate the link between language and reality, their study raises an interesting proposition about the value of linguistic approaches to the study of virtue and perhaps identifies a need for greater virtue literacy.

A study that more readily assessed general usage of virtue terms is the Jubilee Centre and DEMOS (2019) study that looked at social media references to virtue terms. By assessing 1 million tweets and their mention of courage, empathy, honesty, and humility, the study wrests the consideration of virtue away from the written publications that comprise the Google corpus and situate it in everyday language usage. What’s more, it uses different corpora (e.g., parliament, BBC broadcasts) to compare how virtue language is affected by context. To this end, it also looked at the immediate context of such mentions, to consider how virtue terms were used to express judgement, criticise institutions, allude to historical figures, and even debate the definition of these terms. In examining how such language relates to behaviour beyond corpus frequency, it assessed the connection between virtue language and prosocial acts – the ultimate aim of virtue literacy.

Virtue literacy is thus acquainted with corpus approaches, and studies such as these offer useful insights into useful methods and points of comparison for ‘Character in the Corpus’. That corpora indicate the salience of virtue terms, although compelling, is not unquestionable. Fiction, due to its particular playfulness with language, presents specific challenges and opportunities to corpus investigations. Nevertheless, its value lies in its ability to define and understand virtue language by looking at its actual use in context.

Consequently, ‘Character in the Corpus’ uses a collection of ‘reference’ corpora to look at how fiction fares more broadly in its use of virtue terms. These include the British National Corpus (BNC) of approximately 100 million words drawn from written and spoken British English; the Google Books Corpus, consisting of 34 billion words of British English; and the Hansard corpus of 1.6 billion words of British parliamentary speeches. Other corpora covering TV, films, news, the internet and historical texts are adopted in individual instances when they are particularly insightful. 

Seeing these corpora as a constellation of different discourse practices that reflect different spheres of life, also allows teachers to contextualise virtue in relation to pupils’ own lives and the real world. The approach itself is also recommended as part of classroom activity as a means of engaging pupils to think about virtue beyond literature, to see how language transfers and avoid the error of seeing fiction as too distinct from real world behaviour and language use whilst also developing digital-literacy skills.

Indeed, the use of a corpus approaches is warranted by their successful application in secondary school settings as critical tools. The CLiC Dickens project has recently introduced corpus techniques into secondary school classrooms. In one example, students were given the tools to explore ‘A Christmas Carol’ in whichever way they pleased. What is striking is that when given this analytical free reign, selected morality as the focus of their study. By giving students new ways into texts, digital approaches avoid relying on old modes of understanding, and its ‘applied’ nature is particularly appropriate for practical approaches to wisdom.

An example

By way of example, take a look at the following charts, taken from the British English corpus, show how virtue terms (selected from those used in the Jubilee Centre for Character and Virtues’ (2017) A Framework for Character Education in Schools that are most common in the 116 A-level set-text corpus) compare in general language usage and in fiction.

Frequency of selected virtue terms in British English
Frequency of selected virtue terms in English fiction

At first glance, these graphs indicate which virtues feature most frequently in discourse, for example, humility has dropped out of fiction and other texts, whereas a number of other virtues have seen an upswing from the 1990s onwards. The general trend downwards for virtue terms in language usage is bucked by upsurges for particular virtues in the mid to late twentieth century, albeit a trend only partially reflected in fiction.

The charts also show how these terms have fared across time with virtues vying for different places in general usage rankings in a way that they do not in fiction. In fiction, the tendency is for virtue terms to move in step with each other. This suggests the homogeneity of not just fiction, but also of virtue discourse itself. In fact, one of the features discussed as peculiar to fiction is the tendency of virtuous language to cluster – i.e., when one virtue term is mentioned, chances are other virtue terms are close at hand. This is particularly useful for virtue literacy because it means that relatively short passages of text can provide the basis for discussions of virtues and that they often call on reader’s and character’s critical skills in arbitrating between virtues.

Of more specific interest is where these graphs diverge (considered with respect to individual virtue terms in the articles that follow). Greater use in fiction suggests its status as a site of virtue; stretches of discourse appropriate for virtue literacy undertakings. Greater use in general usage suggests that literature is less concerned with virtue than the world at large. In consideration of several virtues, ‘Character in the Corpus’ will point out how both of those assumptions may be false.

False friends

One note of caution with such quantitative analysis should be sounded with regard to polysemy and synonymy; two opposing concepts that represent the slipperiness of language and in so doing present challenges to corpus approaches.

‘Polysemy’ refers to the fact that one word may have multiple meanings. For example, when considering a virtue like justice, we might also consider its lemmatised form just. But just is a word that wears many grammatical hats: often the smallest words have the biggest roles. Whilst our interest might be in its adjectival role, it is most (overwhelmingly) commonly used as an adverbial, and, what’s more, an adverb that can perform an evaluative function depending on the context of the sentence in which it appears. Even those words that do not traverse parts of speech, such as service, may entail different meanings (denotations and connotations) as the spike in general usage around the era of military service shows.

There is, however, one distinct advantage the polysemous soup in which some virtue terms find themselves, and that has to do with virtue literacy itself. Because literacy is fundamentally concerned with comprehending the meaning and use of terminology, it stands to reason that the process of weeding them out from their ‘false friends’ is part and parcel of literacy education. Furthermore, virtuous behaviour underpins or motivates many of these uses and can prompt readers to consider the ways in which virtue infuses many of the ways we think and write.

‘Synonymy’ refers to the opposite phenomenon, whereby several words may refer to one concept; for example, courage and bravery. This is an important aspect when it comes to literacy and comprehension, but one that can be easily reconciled with the idea of enriching pupils’ moral lexicon.

The above cautions are not terminal to corpus approaches, but rather serve to illustrate for teachers some of the pitfalls (and opportunities) of corpus approaches. In fact, these considerations are instructive when thinking about literacy and, accordingly, these concepts are returned to numerous times. Such quantifying should only be an in to contextualised, qualitative discussion of these terms. Language is, after all, a verbal rather than a mathematical phenomenon. Distant reading cannot take the place of considered close, qualitative reading but merely offers a tool by which to identify general patterns for closer interrogation. In this, it complements the approach recommended for virtue literacy structured around teacher-pupil collaboration.


Virtue by definition

The word virtue offers a good starting point when taking a corpus approach to the language of character, acting as an indicator for how literacy and vocabulary are a route to comprehension.

Defining virtue is both a task for linguistics and virtue literacy. Virtue is an abstract term. The reason for this abstraction is that it is a hyponym, a kind of collective noun, that is made concrete by the separate virtues that comprise its constituent parts. In fact, where the word virtue makes an appearance, it is often accompanied by (collocates with) words that identify individual virtues.

Indeed, the frequency with which virtue has been used shows a decline over time relative to a slight uptake in more specific virtue terms. This is a trend also apparent in fictional texts. When fiction is isolated, virtue’s use tracks closely the usage in discourse in general. Consequently, fiction would seem to replicate the fortune’s of virtue in society at large.

Across language

One of the recurring problems that the character educator must face with regards to virtue literacy is the messy nature of language, in particular its fondness for attracting a variety of meanings, what linguists call ‘polysemy’, whereby words that appear the same accrue different meanings. (Its easiest to think of polysemy as the opposite of synonymy.)

When examining the broad array of virtue terminology, a recurring feature (and potential obstacle) is that such words are often more commonly used in their non-virtue sense. You may find it surprising that even the word virtue is used in a non-virtue sense more often than not.

Virtue can therefore serve as a cautionary tale for those embarking on virtue literacy endeavours.Take a look at how often the word virtue is used across text types:

Frequency per million words of virtue by corpus section

From this we could make pronouncements about how concerned certain spheres (academia, journalism, literature) are with virtue. Next, we might state that virtue is relatively infrequent in spoken discourse when compared with written discourse (the one exception being journalism). From these figures it would be easy to suggest that fiction is not as concerned with virtue as academic discourse and little different to magazines. Easy but mistaken.

Strange as it may seem, even a word like virtue is not used in its virtue sense. Specific instantiations of virtue reveal that just two phrases, by virtue of and in virtue of, account for a large component of the instantiations of virtue. The problem is that these clusters semantically dilute the word virtue, as it is reduced to a humble prepositional phrase. Virtue’s semantic role is usurped by a grammatical one (by which parts of a sentence are stitched together).

Why virtue has come to serve this role is open to many debates and theories concerning etymology, language change, and grammaticalization. It is probably fair to say that the enabling connotations of virtue have something to do with this broader usage. (In fact, a feature of virtue-related language seems to be that it is peculiarly susceptible to performing these grammatical functions and illuminate how this terminology has been interpreted and understood over time.) Our corpus examination of virtue therefore requires a little refinement.


Stripping out noise is an important aspect of literacy exercises, but not to their exclusion. At a basic level, such instances need to be exposed as false friends, reiterating to students that not all uses of the terms discussed explicitly relate to virtue.

At a deeper level, such instances can be used as examples by which to discuss with students the ways in which virtue language has come to inform other, mainly discoursal, practices. What this does is to situate a discussion of virtue in relation to other spheres of activity, which in turn reinforces the ways that virtue informs daily life.

When we strip out this noise, we get a better picture of virtue in its core sense and find that, in fact, it has a more even distribution between discourse types.

Frequency of virtue, excluding by/in virtue of (per million words)

Reassuringly, when considered as a proportion of the instances of virtue overall, this is least of an issue for fiction, where three quarters of virtue uses are in its core sense. This suggests fiction has a particular role to play as a resource for exploring virtue.

In fiction

So, what are some of these areas of exploration?

Firstly, there is virtue’s abstract nature. This makes it particularly prone to entering into metaphorical constructions. Writers, for instance, are fond of personifying virtue and this particular trope neatly reinforces virtue as something that is to be embodied.

Secondly, another important aspect attaching to the notion of virtue in fiction is gender. Gender has not only become an important theoretical approach in literary criticism, but also informs the syllabi of secondary school literature curricula, and focussing on the language of the text can offer a way into talking about gender and character.  

Although virtue’s etymology has been ascribed to masculine virility (deriving from the Latin word for man or hero, vir), it has also come to connote feminine chastity. Indeed, the large number of pre-twentieth-century texts of the A-level corpus adopt this second meaning of virtue.

Rather than creating linguistic ‘noise, the semantic breadth of a word like virtue can therefore offer the character educator alternative, historical perspectives, useful material for a lucrative discussion and enriched understanding of virtue.

Comments are closed.

Create a free website or blog at

Up ↑

%d bloggers like this: