The Regularization component is executed using output trees from the English Transformation (decomposition) component. Each transformation in the connective component creates a <PARSE-CONN> connecting one ASSERTION/FRAGMENT to another ASSERTION/FRAGMENT. The connective structure is arranged according to the Polish (or prefix) notation, but the XML allows the actual connective words to appear between their conjuncts.

PARSE-CONN has the following structure:

    <PARSE-CONN> = <X> ::= <SA> + <LCONNR> + <SA>
    where X is the name of the type of connective, such as
  • CONJOINED — for conjunction;
  • EMBEDDED — for an embedded subject or object;
  • SUB-CONJ — for sentential adjunct CSSTG;
  • RELATION — for preposition phrase of an ASSERTION, where P=H-CONN;
  • CHANGE-OF-STATE — for LXR which is an H-CHANGE, an H-TMBEG or an H-TMEND;
  • WITH-CONJ — for a CONJ-LIKE prepositional phrase with P=with;
  • TIME-CONJ — for from ... to ... or between ... and ... phrases;
  • REL-CLAUSE — for relative clauses, carrying '[WH-MOD]' (for wh-phrases) or '[NMOD]' subordinate clauses.
LCONN and RCONN are the left and right adjuncts of the connective HEADCONN. The substructure of HEADCONN depends on the type of connective. It is described in each transformation.

In general, when one of the transformations finds a relevant substructure in an ASSERTION 'A', it attaches a connective 'PARSE-CONN' to the left of 'A' and creates an ASSERTION / FRAGMENT 'B' from 'A' and attaches it to the right of 'A'. 'A' may be changed to 'A1' in the process. When a successful connective transformation is completed in assertion 'A', the structure

    <PARSE-CONN> + <A1> + <B> + <CONJ-NODE>
replaces 'A'. WHEN an assertion in 'A' is moved up, its conjunct (if it has one) is also moved up along with CONJ-NODE so that when the transformation is completed we have:

    <PARSE-CONN> + <A1> + <B> + <CONJ-NODE>

When transforming ASSERTION 'B', the first transformation is T-CONJ-IN-ASSERTION which will create a connective 'PARSE-CONN' = 'CONJOINED' to the left of 'B' and the conjunct of 'B' will be attached to the right of 'B'. The above structure will then be:

    <PARSE-CONN> + <A1> + <PARSE-CONN> + <B> + <CONJUNCT OF B>

In the example,

    Today, she had no cough, chest pain or shortness of breath being expanded into three basic assertions, will be given:
    • Today, she had no cough
    • ","
      • Today, she had no chest pain
      • "and"
      • Today, she had no shortness of breath
In short, the algorithm creates a Polish notation structure to connect assertions.

The second major task of the Regularization component is to:

  • associate each basic decomposed ASSERTION / FRAGMENT with a medical statement type.
  • determine the type of medical information format type of each basic statement (ASSERTION / FRAGMENT) by the medical classes of all its words, including its contexts, such as document section or surrounding texts. For example, the word penicillin
    1. in section ALLERGIES may mean the patient is allergic to penicillin (i.e. it belongs to the medical information format type Patient State).
    2. in section PLAN (or in similar assertions with a future tense) may mean that the patient was advised, but had not taken, penicillin (i.e. it belongs to information format type Treatment with an implied future).
    3. in section MEDICATIONS may mean the patient has been taken penicillin during his/her stay in the hospital (i.e. it belongs to information format type Treatment).

Copyright © 2005 by Medical Language Processing, LLC.