OxLip+ subscription e-resources
British National Corpus The British National Corpus is a very large (over 100 million words) corpus of modern English, both spoken and written. Click here for more info about which interface of BNC to use.
American National Corpus American English, both spoken and written, will contain 100 million words when completed.
For corpus based research there are purpose built corpora available for various languages including Czech, Dutch, French, Frisian, German and Spanish on OxLip+. Browse by subject for Linguistics/Corpora.
Lexical databases and corpora is a non-exhausitve list of databases and corpora, organised by language.
CHILDES is a database of spoken child language with browsable transcripts in various languages, including Romance, Scandinavian, Celtic, Germanic and East-Asian languages.
CELEX is a lexical database for Dutch, English and German, developed by the Max Planck Institute.
For each language, this data set contains detailed information on:
It is also possible to compile your own corpus from a compilation of texts, e.g. from online newspapers (Nexis), or from older books in EEBO (Early English Books Online). Eighteenth Century Collections Online or Eighteenth Century Journals. For more corpora, browse OxLip+ by subject for Linguistics/Corpora.
For tagging and parsing texts and corpora, have a look at this list of tools.
A really good guide to creating corpora has been developed by IT Services.