The MLP website
has two purposes. For those in the Healthcare field who need timely access to patient information from narrative documents, it provides a system based on Natural Language Processing (NLP) for them to test and possibly adapt or extend for their particular needs. For others, with an interest in language per se and how the vast data of language can be organized for computer processing, the MLP site opens for public examination and use the results of 30-40 years of research and development in NLP, performed primarily at New York University under the aegis of the Linguistic String Project (LSP).

What is the MLP?
The MLP — Medical Language Processor — is a system that transforms free-text clinical documents into an XML structured representation of the information in the documents. Document sentences are parsed, further processed to eliminate ambiguities, and mapped into medically labeled structures, called Information Format Units (IFUs). The IFUs are enriched by the addition of medical knowledge tags drawn from the Structured Health Markup Language (SHML). In this form they become Health Information Units (HIUs), the basic unit of description in the final representation. Processed documents are installed in a clinician-oriented viewer to provide users selective access to textual information needed for patient care, or they can be used in other applications.

In the historical development of the MLP, first a general NLP parser was developed, along with a computer English grammar and associated English lexicon (cf. Sager, Natural Language Information Processing: A computer grammar of English and its application, Addison-Wesley 1981). The next step was to facilitate specialization of the processing for texts in a given field (medicine) by adding a further level of classification to words in the lexicon and developing a "sublanguage" component of the grammar (see Sager et al., Medical language processing: Computer Management of Narrative Data, Addison-Wesley 1987). Processing was extended to neighboring languages: French, German and Dutch.

More recently XML has been incorporated into the system in two ways: tagging lexical entries with their specific medical content (using a Medical Tag Hierarchy), and incorporating the results of tagging into an XML representation of the parsed documents. The tags serve for retrieval and display in terms of categories familar to physicians.

The MLP medical language processor includes:
     English healthcare syntactic lexicon and medically tagged lexicon,
     the MLP C++ parser,
     parsing with English medical grammar,
     selection with medical cooccurrence patterns,
     English transformation,
     syntactic regularization, and
     mapping into medical information format structures.

And a set of XML tools for browsing and display.