Skip to Main Content

Course II: Paper 4: The History of the English Language to c.1800: Text analysis tools

What is Text Analysis?

Many websites or software programs allow you to analyse your chosen texts. Text analysis tools allow you to explore a text quantitatively, e.g. by instances of one particular word; and systematically, e.g. Looking at the types of words used and phrases used.  This can be particularly useful or finding all instances of a specific word within a text.  The tools will also list all the words in your chosen text by type, e.g. adjective or plural noun.

Using the text analysis tools allows you to compare two or more texts and lets you gather key features of the language used.  You can search for the occurrences of just one word, or a more complex pattern, e.g. pairs of words within one context. 

These tools are good for looking at the different ways authors write across genre or type, e.g. Fiction and non-fiction. 

Researchers also put them to use to examine questions of authorship. With the tools available you can search your own chosen texts. You can also use established corpora like the British National Corpus to look for common occurrences of words and common phrases.

Useful links

If you have a text, for example retrieved from a database like ProQuest One Literature, you can run that through a text analysis tool and get information or see patterns that may otherwise be difficult to spot.

Concordance tools - search for a word or phrase and see all instances in your text, displayed with a limited amount of context

  • Antconc - a tool that you install locally and use to explore texts in various ways, for example by creating concordances, word lists and collocations.
  • LexTutor  - a set of tools that you can use on pre-loaded texts or material that you add. Includes a concordance program, word list functions and much more.
  • Taporware (wide range of specialist text analysis tools)

Other tools/sets of different tools

Word Class taggers

Word class taggers - a tool that will analyse the words in your text and mark this part of speech.

These are two taggers available for free online:

The CCG POS tagger results look like this, with a key below the extract:

Text used:

Poe, Edgar Allan, 1809-1849:  The Tell-Tale Heart (Penguin Classics)
Cambridge 2011
ProQuest Information and Learning
Penguin Classics

Text Analysis: Statistics

A good place to start is to get som statistics of your chosen texts, to find out a bit more about them. There are many free tools online that will give you statistics about a text, but one we recommend is Voyant Tools.

Voyant Tools is a web-based text reading and analysis environment. It is a scholarly project that is designed to facilitate reading and interpretive practices. Do the exercise below to learn how to use a tool like Voyant and to see what kind of information it can give you.


Voyant Tools - Exercise

For this exercise you will need to copy some text online. You can find suitable texts by using the eTexts tab at the top of this page.

  1. Open
  2. Paste your chosen text into the search box and press Reveal.

You should be presented with something that looks like this:

Let's look at each part in a bit more detail to see what information it contains.

In the bottom right corner look at the summary:

This will tell you how many words are in your text, and how many of them are unique words. What does this tell us about Poe's use of language in The Tell-Tale Heart? You may need to paste in other texts and compare them to get an idea about how authors tend to write in comparison with Poe. With this tool you can compare two or more different authors, or multiple texts by the same author.

Have a look at the most frequent words used. In this Poe extract the most frequent words used are LouderIncreasedNoise. You can use this information to find out how often these words appear in the English language - see the Corpora tab at the top of this page to find out more.

Next, have a look at the graph in the top right corner. This displays the appearance of those frequent words throughout the text, so you can visually see which ones appear at the same time as each other.

We can see in the Poe example that the word Sound is used a lot at the beginning of the text, but this stops, and later the words Heard and Louder appear very often together.