The MLP English Medical Grammar, emgrm.txt is based on Linguistic String Analaysis (Z.Harris, String Analysis of Sentence Structure, Mouton, The Hague, 1962). The grammar is composed of 3 sections, labeled *BNF, *LISTS, *RESTR.

The *BNF section contains the definitions, written in Backus Naur Form, of the grammatical structures found in the sentences of clinical narrative, i.e., in reports of clinic visits, hospital discharge summaries, and the like. The parsing engine ("parser") calls on these definitions to build a "parse tree" for each text sentence, in which the nodes of the tree correspond to elements of the BNF definitions. The parse tree for a given sentence represents the syntactic analysis of the sentence and at the same time a record of the path taken through the grammar (the BNF definitions) to arrive at the analysis. An example of a BNF definition is:


Here an ASSERTION is composed of the elements SUBJECT, TENSE, VERB, OBJECT interspersed with sentence adjuncts SA, ending with SA-LAST. Sentence adjuncts (e.g. 'generally', 'at night', 'when necessary') are optional. The other elements must be present to form a wellformed ASSERTION, although TENSE may be empty if the verb carries its own tense ('is' vs. 'will be').

The *LISTS section contains lists used in procedures other than building the parse tree, such as "Restrictions", which are procedures which test the parse tree for the presence or absence of particular features.

The *RESTR section contains procedures written in the MLP's "Restriction Language" (cf. The Restriction Language Manual). These procedures include the restrictions proper (parse tree tests) and a set of Routines for navigating the parse tree and carrying out packaged linguistic relations, such as finding the COELEMENT of a given element of a definition, or finding the CORE of a given element, which is the "terminal symbol" (not further defined symbol) at the bottom of a tree branch. The terminal symbols in a parse tree are symbols for parts of speech (N noun, PRO pronoun, ADJ adjective, etc.). These are the same symbols assigned to words in the lexicon.

As the parser proceeds from left to right through the sentence, and top to bottom through the BNF definitions, the point of contact is when the current element of the grammar is a terminal symbol that matches a part of speech of the current sentence word as given by the lexicon. In the lexicon the word has not only its part-of-speech assignment(s), but further grammatical and medical "attributes" (subclass memberships). Once the parser associates a terminal symbol of the parse tree with a part of speech in the lexical entry of the current sentence word, the attributes of the word can be tested by a RESTRICTION (e.g. number agreement of subject and verb).

The restrictions are of two types: Disqualify and Wellformedness.

Disqualify restrictions that are associated with ("housed" on) a given node are executed before the node is constructed. Wellformedness restrictions housed on a given node are executed after the node is built. D-restrictions generally disqualify certain options within an adjunct-type definition. W-restrictions often test the lexical attributes of words attached to terminal nodes of the parse tree.

The Restriction section of the grammar is divided into linguistically motivated subsections, e.g. CONJUNCTION RESTRICTIONS, WH-RESTRICTIONS. Most of the medical constrainsts are in the section POSITION RESTRICTIONS. One-line descriptions of all POSITION restrictions are to be found in the current version of emgrm at "*DPOSINDEX" and "*WPOSINDEX". Explanations and examples of restrictions throughout the grammar are given in the comments in brackets at the start of the restriction.

Two new sections of restrictions (not in Sager 1981) are WATT and WPHRASE.

WATT (Well-formed Attribute) restrictions treat the lexical ambiguity of particularly troublesome individual words or abbreviations as manifested in multiple medical subclass assignments (attributes). E.g., WATT-CC attempts to resolve the ambiguity of 'cc' (cubic centimeters), attribute NUNIT, vs 'cc' (cardiac catheterization), attribute H-TTSURG.

A WPHRASE restriction assigns a PHRASE-ATT node attribute to the head of a parse subtree, which will subsequently be treated as a single unit. The current list of phrase attributes includes: AGE-PHRASE, DATE-PHRASE, DOSE-PHRASE, GRAFT-PHRASE, INDIC-PHRASE, INFLUENCE-PHRASE, INST-PHRASE, QUANT-PHRASE, RADIATE-PHRASE, SOURCE-ATT, SOURCE-PHRASE, TESTENV-PHRASE, TIME-POST-PHRASE, TIME-PHRASE.

Copyright © 2005 by Medical Language Processing, LLC.