Training - Part 2: Text Analysis - Course II: Paper 4: The History of the English Language to c.1800

What is Text Analysis?

Many websites or software programs allow you to analyse your chosen texts. Text analysis tools allow you to explore a text quantitatively, e.g. by instances of one particular word; and systematically, e.g. looking at the types of words used and phrases used. This can be particularly useful or finding all instances of a specific word within a text. The tools will also list all the words in your chosen text by type, e.g. adjective or plural noun.

Using the text analysis tools allows you to compare two or more texts and lets you gather key features of the language used. You can search for the occurrences of just one word, or a more complex pattern, e.g. pairs of words within one context.

These tools are good for looking at the different ways authors write across genre or type, e.g. fiction and non-fiction.

Researchers also put them to use to examine questions of authorship. With the tools available you can search your own chosen texts. You can also use established corpora like the British National Corpus to look for common occurrences of words and common phrases.

Quick Links

Corpus of Middle English Prose and Verse

Dictionary of Old English: Web Corpus

Early English Books Online

Eighteenth Century Collections Online

Lexicons of Early Modern English

Middle English Dictionary

Oxford English Dictionary

Oxford Text Archive

SOLO

Voyant Tools

Where to start

When starting with Text Analysis you can begin with the most basic tasks to give yourself a launching point for thinking more deeply about your texts. Any basic analysis you do will have to be followed up with more detailed analysis, so don't stop after the first few steps or your work will be incomplete.

First, you can get some basic statistics using a text analysis website like Voyant Tools - this can give you an overview of the statistics of your whole text, as well as highlight frequently used words or words used in unusual contexts. You can follow this up by using Dictionaries and Corpora to deepening your understanding of the word you are analysing. Work through the exercises below to get an introduction to what you can achieve with Text Analysis.

Note: The examples given are generic, so as to not be specific to the time period you may be looking at, but the techniques are the same.

Text Analysis: Statistics

There are many free tools online that will give you statistics about a text, but one we recommend is Voyant Tools.

Voyant Tools is a web-based text reading and analysis environment. It is a scholarly project that is designed to facilitate reading and interpretive practices. Do the exercise below to learn how to use a tool like Voyant and to see what kind of information it can give you.

Exercise One: Voyant Tools

If you found some texts in Part 1 of this training programme, then you can copy and paste those to use in this exercise - or you can choose something else. We have chosen an online text of The Tell-Tale Heart by Edgar Alan Poe.

Open https://voyant-tools.org/
Paste your chosen text into the search box and press Reveal.

You should be presented with something that looks like this:

Let's look at each part in a bit more detail to see what information it contains.

In the bottom right corner look at the summary:

This will tell you how many words are in your text, and how many of them are unique words. What does this tell us about Poe's use of language? You may need to paste in other texts and compare them to get an idea about how authors tend to write in comparison with Poe. With this tool you can compare two or more different authors, or multiple texts by the same author.

Have a look at the most frequent words used. In this Poe extract the most frequent words used are Louder, Increased, Noise. Later, in Exercise Two we will use this information to find out how often these words appear in the English language.

Next, have a look at the graph in the top right corner. This displays the appearance of those frequent words throughout the text, so you can visually see which ones appear at the same time as each other.

We can see in the Poe example that the word Sound is used a lot at the beginning of the text, but this stops, and later the words Heard and Louder appear very often together.

Once you have identified a word or two of interest to you, then you can look further at meaning, history, and context.

Dictionaries

You next step could be to take a closer look at any words you have identified via the first step above, by using dictionaries.

You’re probably all familiar with the OED but some things to note are:

Use the Advanced search (this tip goes for any resource you’re using, it will give you increased functionality). The OED advanced search has increased options for search limits and more flexibility.

Historical thesaurus: allows you to chart the linguistic progress of a chosen object, concept, or expression

Sources: Explore the top authors quoted in the OED

Timelines: a graphical representation showing when words entered the English language

Historical Dictionaries

You will find Historical Dictionaries on both EEBO and ECCO. Use the Advanced Search to find the word "Dictionaries"

One historical dictionary that many students have made use of in the past is Samuel Johnson’s Dictionary. You can access the page images through ECCO and it is fully searchable. It’s also available online at https://johnsonsdictionaryonline.com/ but that hasn’t been fully transcribed yet.

Lexicons of Early Modern English

Lexicons of Early Modern English is a historical database of monolingual, bilingual, and polyglot dictionaries, lexical encyclopedias, hard-word glossaries, spelling lists, and lexically-valuable treatises surviving in print or manuscript from about 1475 to 1755.

Texts of word-entries whose headword (source) or explanation (target) language is English tell us what speakers of English thought about their tongue in the period. Their lexical insights shaped the history of our living tongue. Any contemporary's testimony about the meaning of their own words has an undeniable authority. For this reason, LEME is not a period dictionary like The Middle English Dictionary, rather, it is a source of "contemporary comments" that illustrate word usage.

Exercise Two: LEME

First: Read through the introductory help notes on LEME.
This is a very short introduction, and will give you a quick overview of how the website works, and how to search efficiently.

Second: Click on the Search button on the top menu, select Advanced Search and input a word to search for. If you don't have a specific search in mind that you want to try, search for the word Assembly. Limit you search by date - perhaps 1450 to 1650. Then press the Search button.

Third: Explore the results you have - don't be afraid of clicking on results to see what information it gives you

Continue your exploration of the site by using the LEME Word List to browse - input a word and look at the word list to find alternative spellings or adjacent words.

Comparing a text against a whole language

We've seen with the above tools how you can look up individual words to find out more about them, but if you want to compare a word to a sample of a whole language, then you will need to use a Corpus.

A corpus is a collection of texts or text extracts that have been put together to be used as a sample of a language or language variety. It consists of texts that have been produced in 'natural contexts' (published books, ordinary conversation, letters, newspapers, lectures etc), which means it mirrors natural language. A well-composed corpus can be used to answer questions about language use, such as:

Does 'wicked' generally mean 'good' or 'bad'? Has this meaning changed over time? Does the use differ between different kinds of text? Do different (kinds of) speakers use the word in the same way?

A reference corpus (created to be a balanced sample of a language variety) can be used as the basis of comparison between a text/genre and 'standard language'.

Specialised corpora can be used to examine or compare different language varieties, such as language from a particular area, covering a certain genre or text type, produced by particular language users, etc.

Corpora can be synchrone (covering one time) or diachrone (covering several time periods), consist of different media (written or spoken language) and be composed of different languages.

Annotated corpora have extra information added, usually linguistic information (part-of-speech, lemmata) or metadata (information about the material in the corpus, speakers/authors, situation, extra-linguistic information etc).

There are corpora that can be consulted online, via a custom-built interface, and ones that you explore with stand-alone tools that you install on your computer.

Useful links

The Oxford Text Archive (OTA) contains some of the most useful Corpora for this paper, and they are available to download. Some examples include:

The Lampeter Corpus of Early Modern English Tracts
Parsed Corpus of Early English Correspondence (PCEEC)
A Corpus of English Dialogues 1560-1760 (CED)
Dictionary of Old English Corpus in Electronic Form (DOEC)
The English language of the north-west in the late Modern English period: a Corpus of late 18c Prose
The York-Toronto-Helsinki Parsed Corpus of Old English prose (YCOE)
Corpus of Early English Correspondence Sampler (CEECS)
The York-Helsinki parsed corpus of Old English poetry (YCOEP)
Anthology of Middle English texts
Complete corpus of Old English

Downloading these Corpora from the OTA will give you files that will need to be used in software that can process Corpora.

At this stage, if you are ready to start using Corpora you can follow the steps below. It is a good idea start early with Corpora, as they can be difficult to get used to if you have never used one before.

Optional Exercise

Step One: Download AntConc - this software will enable you to create your own Corpora, or use data you have downloaded from elsewhere.

Step Two: Visit the Oxford Text Archive and search for one of the above listed Corpora. A good one to start with is the Parsed Corpus of Early English Correspondence. You will need to login to the site by selecting Oxford, before being allowed to download the files.

Step Three: The data will download in a Zip (compressed) file which you will need to unzip (open and uncompress) in order to use.

Step Four: Add the files to the AntConc software you have downloaded. To do this select File from the top menu, then Open Dir and select the folder that contains the data files you downloaded from the Oxford Text Archive.

The creators of AntConc have created extensive guides on video, and we would recommend that you work your way through these to understand all the functions before beginning to undertake analysis. You will find these guides on YouTube and I would highly recommend you view the whole series to really understand how to get the most out of the software.

Next Steps

Now you've worked through the training session you can scroll back to the top and have a look through the different tabs, you'll find sections on recommended eBooks & reference, eJournals, Dictionaries, Primary Texts Online, Newspapers & Ephemera, Text Analysis Tools, Corpora.

If you have questions about the exercises on this training guide, or on any of the resources, please email efl-enquiries@bodleian.ox.ac.uk

Course II: Paper 4: The History of the English Language to c.1800: Training - Part 2: Text Analysis

Introduction

What is Text Analysis?

Quick Links

Where to start

Text Analysis: Statistics

Exercise One: Voyant Tools

Dictionaries

Lexicons of Early Modern English

Exercise Two: LEME

Comparing a text against a whole language

Useful links

Next Steps