The MLP Preprocessor
standardizes input patient documents (in text format) for further MLP and other computer processing. The preprocessor is a semi-automatic process. It identifies possible spelling errors, abbreviations, all forms of patient, family/friend member, staff, facility, and geographic names for anonymization. It turns numbers, units and dates into ANSI standard format. And most importantly, it identifies sentence punctuations and assign an id for each sentence (SID).
The MLP preprocessor is usually institution specific. It builds for each institution a list of spelling corrections, abbreviations (acronyms, short hands,...) and their full forms, all identity forms of names (such as, Mr. John Fitzerald Smith, John Smith, Mr. Smith, Mr. J.F. Smith, John... in a document set all pointing to the same person). These lists are found to be cummulative.
The MLP Preprocessor builds for each institution, cumulatively:
The MLP preprocessor is intended to be suggestive. The lists above are constructed in such a way that allows the MLP to preserve original sources.
Assignment of a unique id to each sentence helps the MLP to track what happens to the sentence during processing, as well as linking a health information unit (HIU) to its original sentence (see Viewer).