Skip to content
Sean A Fulop edited this page Oct 28, 2016 · 4 revisions

The Penn Treebank is probably well known to you, if you are reading this. To briefly summarize, it is a large corpus of sentences which have been annotated with complete syntactic structures and parts of speech, using a rich set of part-of-speech tags and n-ary branched tree structures. It is also proprietary, so you will only be able to use our code if you already have access to the Penn Treebank.

The PennTreebank-Transcoder code is still in final development, but it is basically working as planned. Some more tweaking is needed to parse all the Treebank sentences without errors (this is why it is on Github, so tweak away). The purpose of the code is to transcode the Penn Treebank annotated sentences into pairs of structures: On one side, we output a "lambda term" -- this is a term of the lambda calculus which represents the rough semantic structure of the original sentence, in particular its function-argument structure. Strictly speaking we extend the lambda calculus vocabulary of terms, so some of our output terms are no longer legal lambda terms but they serve a purpose here to be described later.

Clone this wiki locally