The MLP Preprocessor

standardizes input patient documents (in text format) for further MLP and other computer processing. The preprocessor is a semi-automatic process. It identifies possible spelling errors, abbreviations, all forms of patient, family/friend member, staff, facility, and geographic names for anonymization. It turns numbers, units and dates into ANSI standard format. And most importantly, it identifies sentence punctuations and assign an id for each sentence (SID).

The MLP preprocessor is usually institution specific. It builds for each institution a list of spelling corrections, abbreviations (acronyms, short hands,...) and their full forms, all identity forms of names (such as, Mr. John Fitzerald Smith, John Smith, Mr. Smith, Mr. J.F. Smith, John... in a document set all pointing to the same person). These lists are found to be cummulative.

The MLP Preprocessor builds for each institution, cumulatively:
     List of spelling errors and corrections,
     List of abbreviations and their full forms (cf. Abbreviations list),
     Lists of proper names, such as patient, family and friend, physician, facility and location/geographic names and their anonimized tokens,
     List of numbers, dates and units and their standard ANSI forms,
     List of dose forms (cf. Dose list),
     List of medications,
     List of organisms,
     List of words not found in MLP dictionary, etc.

The MLP preprocessor is intended to be suggestive. The lists above are constructed in such a way that allows the MLP to preserve original sources.

Assignment of a unique id to each sentence helps the MLP to track what happens to the sentence during processing, as well as linking a health information unit (HIU) to its original sentence (see Viewer).