Learning parsers for Natural Language Processing

By "Zow" Terry Brugger

For CS-471, Fall '96

	The purpose of this project was to create a parser which would
extend its lexicon by interacting with the user to learn new words
whenever it attempted to parse a word which it didn't know. The parser
was based off of my previous work done for PSYC-526 which is available
at here. The project was
broken up into three areas:
1. Extend the existing parser to make a user friendly interface. The
underlying lisp interpreter should be relatively transparent to the
2. Improve the lexicon to accommodate more than just the meaning of the
lexeme. Include information on the part of speech, verb form, and
other relevant information.
3. Have the parser learn new words when it encounters a word which it
doesn't recognize.

All three parts were successfully completed. The improved engine is
available in Appendix A and the improved dictionary is available in
Appendix B.

	Part one was intended to be the simplest part of the project
when in fact it turned out to be the most difficult. Lisp was not
designed to input strings -- only lisp expressions (atoms and
lists). In the end I was able to accomplish all I intended to
though. The read-line function is the work horse behind the input
however I had to use a function from the Eliza program in order to
convert the string into a list to be parsed. Output was much easier as
the princ function did most of what I needed. I did however need to
write a function in order to print a list of strings such that only
the information in the strings was printed.
	I simplified my implementation by encapsulating the actual
parsing "engine" with another function. All that is required of the
user is the execution of the parser function from the lisp prompt. My
test output which demonstrates the interface is in Appendix C.

	Part two actually turned out to be the easiest part of the
project. The dictionary is implemented as an a-list. While this would
prove unwieldy and slow for a large dictionary, for the purposes of a
small research program such as this one, it works very
efficiently. Originally, the second item of the a-list was a string
representing the meaning of the word or phrase. In part two I changed
it into a lisp structure which stores the meaning as a string. It also
stores a possible action which could be performed (which is stored as
a lisp function call), the part of speech the word or phrase belongs
to, the agreement such as first person singular and the verb form and
sub category for verbs. 
	The use of a structure also allows us to easily modify and
extend this structure during future development for instance to allow
for truth and context information. While most of the fields of the
structure are not used in this version of the parser, they allow for
the storage of important information which may be used by future
versions. The field which provides the most promise is the
action. This field would allow the storage of lisp functions which
would be executed upon the parsing of the sentence. This would most
likely be extended through the use of variables. For example the word
set may have the following action:

(setq (? N) (? O))

so that when the sentence "set x to 25" is parsed, the parser executes
the function:

(setq x 25)

	Finally, learning was implemented. This was the primary
purpose of implementing a user interface at this stage: since the user
will be required to interact with the parser for learning the rest of
the parsing process would intuitively be interactive as well. The
basic idea is that if the sentence is parsed down to a word which is
not in the dictionary, the parser gets the vital information on that
word (most all of the information in the lexeme structure) and adds it
to the dictionary. The learn function which does all this is easily
modified to accommodate new fields for the lexeme. Through very basic
flow control, the learn function only prompts for the required fields,
for instance the verb form is only requested if the word is a verb.
	Appendix C demonstrates the learning parser in action. In the
first sentence, the entire sentence is understood, and the information
on it is printed out. In the second, which is similar to the first, a
few words are not understood, so after they are added to the lexicon,
the meaning of the entire sentence is printed. In the final sentence,
nothing is previously known, so all the words in the sentence are
learned and the meaning of the entire sentence is again returned. This
sentence also demonstrates how the information on the verb form and
sub-category are only retrieved in the case of a verb.

	Overall, this project turned out as I expected. Very useful
additions were made to the parser and I learned a great deal about
Natural Language Processing, LISP and the field of AI in general.


12/2/96 - Origional paper finished

1/17/97 - converted to HTML

Last Modified: 5/22/2000

"Zow" Terry Brugger
Last modified: Mon May 22 22:06:26 PDT 2000