Lecture Summary (01/30/2007)

This session was meant to be a revision class for the purpose of our preparation for the end of term test scheduled to be a few days later. In this context we looked at last year's end of term test and tried to answer its questions. By doing so we had a kind of overview of the main topics of the class: dictionaries in general, pronunciation, spelling/orthography, morphology and lexicography. This session was really helpful for the undertsanding of some aspects and for the preparation for the test in general.
30.1.07 14:16

Lecture Summary (01/23/2007)

This time the topic was "Computational Lexicography".

The lecture started with a review of lexicography principles. A good lexicography needs a certain quantity meaning completeness of coverage. This includes the number of entries (extensional coverage) and the number of types of lexical information (intensional coverage). Another aspect is of course the quality of the lexicography. The information given has to be correct (types of lexical information) and the structure must be consistent (macrostructure, microstructure, mesostructure).

After that Mr Gibbon showed us a model to illustrate the so called lexicographic workflow cycle.

The process starts with the data acquisition:

  • Recordings
  • Text collection
  • Concordance
  • Dictionaries
  • ...

The next step is the lexicon construction:

  • Metadata
  • Information retrieval
  • Linguistic analyses

The cycle continues with the access to data:

  • Traditional print media
  • Hyperlexicon: CD, internet
  • Software with lexicon components: word processing; speech processing

Finally the lexical evaluation takes place:

  • Internal: consistency; completeness
  • External: utility for the users

Then we took a closer look at lexical data acquisition, especially at concordances. Mr Gibbon introduced the term KWIC (KeyWord In Context) concordance and called it a special kind of preliminary, corpus-based dictionary. In a KWIC concordance each word in a text corpus is paired with its contexts of occurence in this corpus. Just a short example for a KWIC concordance with right-hand contexts: in the (very small) text corpus "I like football" the word "I" would be paired with "like" and "like" would be paired with "football". Of course Mr Gibbon chose a more complex example to illustrate this concept: he showed us an extract out of "Notes from a Small Island" by Bill Brystol followed by the respective KWIC concordance. Besides we learned that Google is a special form of KWIC concordance.

The process of creating a KWIC concordance contains basically six steps:

  1. Corpus creation: creating a corpus of texts in electronic format
  2. Tokenisation: eliminating punctuation marks and capital letters plus breaking the text into context units
  3. Keyword list extraction: listing up all words of the corpus alphabetically and removing duplicate words
  4. Context collation: pairing the keywords with their respective left and right contexts
  5. Search: searching for KWIC in corpus
  6. Output format

Afterwards Mr Gibbon showed us the process of computing a KWIC concordance. Finally we found out that KWIC concordances make the search for lexical information more efficient by putting information about words in one place.

29.1.07 23:01

Lecture Summary (01/16/2007)

In this meeting we dealt with the topic "Types of lexical information: semantics".

First of all we had a short revision on the main kinds of definitions (componential definition, syntagmatic definition, paradigmatic definition). Then we had to analyse a corpus, namely a text about poodles (one of Mr Gibbon's favorite topics). We learned that poodle is a hyponyme (hypo meaning subordinate) for dog, consequently dog is a hyperonym for poodle. Poodle and terrier are co-hyponyms for dog. In the context of the corpus cross and hybrid are synonyms because they have the same meaning. An example for antonymes would be for instance boy and girl.

Mr Gibbon also revised the microstructure and showed us once more his model of types of lexical information and added models of microstructure information types (appearance, structure, meaning) and lexical semantics. Furthermore we looked once again at standard dictionary definitions and the components of definitions. In this respect Mr Gibbon introduced a new term for the genus proximum hierarchy (tree structure): taxonomy. Taxonomies are used in traditional lexicography for cross-references in standard definitions and for thesaurus construction. They are also of importance for other fields like artificial intelligence and text technology. Then we saw a summary of semantic relations featuring taxonomy (including hyperonym and hyponym) and meronomy (part-whole relation, syntagmatic relations).

Finally Mr Gibbon gave another example by showing us a text about ginger beer.

29.1.07 22:04

Lecture Summary (12/19/2006)

The topic of this session was "Types of lexical information: grammar (parts of speech categories & subcategories".

The first aspect being dealt with was sentence syntax. First of all we discussed the noun categories. Determiners can be divided in four categories: articles (definite: the; indefinite: a), possesives (my, your, his, her, its, our, their), demonstratives (proximal: this; distal: that) and quantifiers. The different types of adjectives are scalar, polar, appraisive and ordinal. A noun is either a proper noun (a name) or a common noun (countable or uncountable). The categorie of pronouns includes personal pronouns, possesive pronouns, demonstrative pronouns, quantifier pronouns and relative pronouns.

Afterwards we looked at the verb categories. There are two main types of verbs: main verbs (finite forms and non-finite forms) and periphrastic verbs (auxiliary verb + non-finite main verb). Adverbs have six different versions: deictic, time, place, direction, manner and degree.

The last aspect in this respect was named glue categories which is an exclusive term of Mr Gibbon. It includes prepositions, conjunctions and interjections.

After that Mr Gibbon informed us about the structure of language. We learned that there is a certain sign hierarchy (from top to bottom):

  • dialogue
  • monologue/text
  • sentence
  • word
  • morpheme
  • phoneme

Mr Gibbon answered the question "What is structure?" by pointing out the two kinds of constitutive relations: structural relations (including syntagmatic relations and paradigmatic relations) and semiotic relations (realisation and interpretation). Furthermore we learned that a syllable consists of an onset, a nucleus and a coda (nucleus and coda form the rhyme).

29.1.07 20:27

Lecture Summary (12/12/2006)

This lecture had two parts. The first one was a presentation by Sascha Griffith of SIL (Summer Institute of Linguistics) on a database system called Toolbox (formerly known as Shoebox). It is for example used to create a lexical database of a rather unknown language. Mr Griffith illustrated the programm by showing us a screenshot of the microstructure. This was a very interesting aspect for us because it showed us that the terms we have come across in this course like "microstructure" are of importance for linguistic work. Afterwards we saw some funtions of Toolbox in action.

In the second part of the session Mr Gibbon continued talking about some aspects of the previous lecture, namely inflection and word formation. The function of inflection is to mark the syntagmatic relation of words to their contexts and is realized with a stem plus an affix (prefix, suffix, circumfix, infix, superfix) whereas the root/morpheme creation creates new parts of speech and new meanings with parts of 2 or more existing stems. The derivation has the same function but is formed only with one stem plus an affix. Compounding creates meaning by combining at least 2 existing stems.

Furthermore we learned more about the internal structure of English. English words consist of a stem and an inflection. In the case of the English language inflection means suffixes or stem vowel changes, in other languages you can also find  prefixes (many African languages), circumfixes (German) or superfixes. Stems of English words can be either (simple) roots (lexical morphemes) or (more complex) derivations, compounds or even both (synthetic compounds). Afterwards Mr Gibbon showed us an illustration of the hierarchy of words and their parts.

29.1.07 12:08

Lecture Summary (12/05/2006)

The topic of this session was called "Types of lexical information: morphology (inflection and word formation)".

As an introduction to the field of word formation we asked ourselves why and by whom it is used. It is an essential aspect of any language because new concepts, ideas or technical devices require new words. It also happens that new words are invented on the spot. Consequently word formation is of importance to scientists, engineers, product branding companies (for the purpose of illustration we looked at the website of nomen) and poets. But off course it is also a part of everybody's every day life.

Afterwards Mr Gibbon introduced the poem "Jabberwocky" to us. It was written by Lewis Carroll and appeared in his book "Alice Through the Looking Glass". It is full of pseudowords which look like english words but in fact are inventions of the author. Therefore "Jabberwocky" is a good example of word formation.

The next aspect being dealt with in the lecture was morphological structure which includes inflection and word formation. Inflection marks the relation of words to their contexts. For this purpose a grammatical morpheme (in English in form of a suffix) is added. Thus the class of the word and its central meaning does not change. Word formation is either done with derivation (a stem plus an affix) or with compounds (at least two stems are combined). This might change the part of speech and the meaning.

Morphemes are defined as the smallest meaningful parts of words. They can be divided in two main types. Lexical morphemes (content morphemes, roots) are an open set including free words like boy, girl, man, etc. Grammatical morphemes (structural morphemes) are a closed set and either free (prepositions, conjunctions, auxiliary verbs) or bound (affixes in word formation and inflection).

Finally we talked about allomorphs. Allomorphs means that one morpheme is realized in different variants. As an example we chose the plural morpheme of nouns. In this case the same morpheme can be realized for example with an -s (e.g. dog -> dogs) or -en (ox -> oxen) or a stem vowel change (man -> men).

25.1.07 14:34

Lecture Summary (11/28/2006)

Topic: "Types of Lexical Information: Pronunciation"

As a first step we took a look at the surface structure which can be divided into the lingusitic description (metalanguage) and the units of language themselves (object language). In a dictionary the metalanguage is represented for example by the typography and layout, the object language can be found in the information on spelling and pronunciation of the words.

There are two ways of rendering information: pronunciation and spelling. The rendering of structures consists of three different versions: the acoustic modality (pronunciation rules), the visual modality (spelling) and the inter-modality conversion (sound-spelling rules). In the case of dictionaries the sound is represented by phonemes and syllables in a prosodic hierarchy. The basic syllable structure of the english language can be described as CCCVVCCC (c = consonant, v = vowel). In this respect has to be considered that affricates count as one phoneme. The definition of phoneme depends on the sign component it is focussed on so there are four different ways of defining the term.

The description of sounds in dictionaries contains just enough phonetic detail to distinguish words. If you are looking for a more detailed way of representing pronunciation you have to use phonetic transcription based on articulatory phonetics.

At the end of this session we dealt with spelling-to-sound rules such as "i before e except after c" (-> ceiling).

24.1.07 20:50

