Skip to content

Commit 4160ba6

Browse files
committed
Merge branch 'identifiers'
This PR allows either the imported functions or cli tool to be used fairly interchangeably for converting documents to relational facts. Also updates documentation to have better examples and streamlined information (quite a few functions were made anonymous since they were called by `makeIdentifiers`). Major Changes: * README.md is now README.rst * Parameterized the options in `parse.writeBk`, `parse.setParam`, and `parse.getTarget`. Each of these may be specified as an argument to the `parse.makeIdentifiers` function. * `files/doi.txt` added so there was a text version to show equivalence with `corpus.declaration()`. * Lots of docstrings!
2 parents d6fef81 + 9aa27cb commit 4160ba6

File tree

11 files changed

+589
-214
lines changed

11 files changed

+589
-214
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
*~
22
train
33
test
4+
files2
45

56
bk.txt
67
blockIDs.txt
@@ -115,3 +116,4 @@ venv.bak/
115116

116117
# mypy
117118
.mypy_cache/
119+

README.md

Lines changed: 0 additions & 70 deletions
This file was deleted.

README.rst

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
``rnlp``
2+
========
3+
4+
*Relational NLP Preprocessing: A Python package and tool for converting text into a set of relational facts.*
5+
6+
.. image:: https://img.shields.io/pypi/pyversions/rnlp.svg?style=flat-square
7+
.. image:: https://img.shields.io/pypi/v/rnlp.svg?style=flat-square
8+
.. image:: https://img.shields.io/pypi/l/rnlp.svg?style=flat-square
9+
.. image:: https://img.shields.io/readthedocs/rnlp/stable.svg?flat-square
10+
:target: http://rnlp.readthedocs.io/en/stable/
11+
12+
**Kaushik Roy** (`@kkroy36`_) and **Alexander L. Hayes** (`@batflyer`_)
13+
14+
Installation
15+
------------
16+
17+
Stable builds on PyPi
18+
19+
.. code-block:: bash
20+
21+
pip install rnlp
22+
23+
Development builds on GitHub
24+
25+
.. code-block:: bash
26+
27+
pip install git+git://github.com/starling-lab/rnlp.git
28+
29+
Quick-Start
30+
-----------
31+
32+
``rnlp`` can be used either as a CLI tool or as an imported Python Package.
33+
34+
+---------------------------------------+--------------------------------------+
35+
| **CLI** | **Imported** |
36+
+---------------------------------------+--------------------------------------+
37+
|.. code-block:: bash |.. code-block:: python |
38+
| | |
39+
| $ python -m rnlp -f files/doi.txt | from rnlp.corpus import declaration |
40+
| | import rnlp |
41+
| | |
42+
| | doi = declaration() |
43+
| | rnlp.converter(doi) |
44+
+---------------------------------------+--------------------------------------+
45+
46+
Text will be converted into relational facts, relations encoded are:
47+
48+
- between blocks of size 'n' (i.e. 2 sentences) in the blocks.
49+
50+
- between block's of size n (i.e. 'n' sentences) and sentences in the blocks.
51+
52+
- between sentences and words in the sentences.
53+
54+
---
55+
56+
The relationships currently encoded are:
57+
58+
1. earlySentenceInBlock - sentence occurs within a third of the block length
59+
60+
2. earlyWordInSentence - word occurs within a third of the sentence length
61+
62+
3. lateSentenceInBlock - sentence occurs after two-thirds of the block length
63+
64+
4. midWayWordInSentence - word occurs between a third and two-thirds of the block length
65+
66+
5. nextSentenceInBlock - sentence that follows a sentence in a block
67+
68+
6. nextWordInSentence - word that follows a word in a sentence in a block
69+
70+
7. sentenceInBlock - sentence occurs in a block
71+
72+
8. wordInSentence - word occurs in a sentence.
73+
74+
9. wordString - the string contained in the word.
75+
76+
10. partOfSpeech - the part of speech of the word.
77+
78+
---
79+
80+
Files contain a toy corpus (``files/``) and an image of a BoostSRL tree for predicting if a word in a sentence is the word "you".
81+
82+
.. image:: https://raw.githubusercontent.com/starling-lab/rnlp/master/docs/img/output.png
83+
84+
The tree says that if the word string contained in word 'b' is "you" then 'b' is the word "you". (This is of course true).
85+
A more interesting inference is the False branch that says that if word 'b' is an early word in sentence 'a' and word 'anon12035' is also an early word in sentence 'a' and if the word string contained in word 'anon12035' is "Thank", then the word 'b' has decent change of being the word "you". (The model was able to learn that the word "you" often occurs with the word "Thank" in the same sentence when "Thank" appears early in that sentence).
86+
87+
.. _`@kkroy36`: https://github.com/kkroy36/
88+
.. _`@batflyer`: https://github.com/batflyer/

docs/source/index.rst

Lines changed: 91 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,102 @@
33
You can adapt this file completely to your liking, but it should at least
44
contain the root `toctree` directive.
55
6-
Welcome to rnlp's documentation!
7-
================================
6+
``rnlp``
7+
========
8+
9+
*Relational NLP Preprocessing*: A Python package and tool for converting text
10+
into a set of relational facts.
11+
12+
:Authors:
13+
Kaushik Roy (`@kkroy36 <https://github.com/kkroy36/>`_), Alexander L. Hayes (`@batflyer <https://github.com/batflyer/>`_)
14+
15+
:Index: :ref:`genindex`
16+
:Modules: :ref:`modindex`
17+
:Source: `GitHub <https://github.com/starling-lab/rnlp>`_
18+
:Bugtracker: `GitHub Issues <https://github.com/starling-lab/rnlp/issues/>`_
19+
20+
.. image:: https://img.shields.io/pypi/pyversions/rnlp.svg?style=flat-square
21+
.. image:: https://img.shields.io/pypi/v/rnlp.svg?style=flat-square
22+
.. image:: https://img.shields.io/pypi/l/rnlp.svg?style=flat-square
23+
.. image:: https://img.shields.io/readthedocs/rnlp/stable.svg?flat-square
24+
:target: http://rnlp.readthedocs.io/en/stable/
825

926
.. toctree::
1027
:maxdepth: 2
1128
:caption: Contents:
1229

30+
Installation
31+
------------
32+
33+
Stable builds on PyPi
34+
35+
.. code-block:: bash
36+
37+
pip install rnlp
38+
39+
Development builds on GitHub
40+
41+
.. code-block:: bash
42+
43+
pip install git+git://github.com/starling-lab/rnlp.git
44+
45+
Quick-Start
46+
-----------
47+
48+
``rnlp`` can be used either as a CLI tool or as an imported Python Package.
49+
50+
+---------------------------------------+--------------------------------------+
51+
| **CLI** | **Imported** |
52+
+---------------------------------------+--------------------------------------+
53+
|.. code-block:: bash |.. code-block:: python |
54+
| | |
55+
| $ python -m rnlp -f files/doi.txt | from rnlp.corpus import declaration |
56+
| | import rnlp |
57+
| | |
58+
| | doi = declaration() |
59+
| | rnlp.converter(doi) |
60+
+---------------------------------------+--------------------------------------+
61+
62+
Text will be converted into relational facts, relations encoded are:
63+
64+
- between blocks of size 'n' (i.e. 2 sentences) in the blocks.
65+
66+
- between block's of size n (i.e. 'n' sentences) and sentences in the blocks.
67+
68+
- between sentences and words in the sentences.
69+
70+
---
71+
72+
The relationships currently encoded are:
73+
74+
1. earlySentenceInBlock - sentence occurs within a third of the block length
75+
76+
2. earlyWordInSentence - word occurs within a third of the sentence length
77+
78+
3. lateSentenceInBlock - sentence occurs after two-thirds of the block length
79+
80+
4. midWayWordInSentence - word occurs between a third and two-thirds of the block length
81+
82+
5. nextSentenceInBlock - sentence that follows a sentence in a block
83+
84+
6. nextWordInSentence - word that follows a word in a sentence in a block
85+
86+
7. sentenceInBlock - sentence occurs in a block
87+
88+
8. wordInSentence - word occurs in a sentence.
89+
90+
9. wordString - the string contained in the word.
91+
92+
10. partOfSpeech - the part of speech of the word.
93+
94+
---
95+
96+
Files contain a toy corpus (``files/``) and an image of a BoostSRL tree for predicting if a word in a sentence is the word "you".
1397

98+
.. image:: https://raw.githubusercontent.com/starling-lab/rnlp/master/docs/img/output.png
1499

15-
Indices and tables
16-
==================
100+
The tree says that if the word string contained in word 'b' is "you" then 'b' is the word "you". (This is of course true).
101+
A more interesting inference is the False branch that says that if word 'b' is an early word in sentence 'a' and word 'anon12035' is also an early word in sentence 'a' and if the word string contained in word 'anon12035' is "Thank", then the word 'b' has decent change of being the word "you". (The model was able to learn that the word "you" often occurs with the word "Thank" in the same sentence when "Thank" appears early in that sentence).
17102

18-
* :ref:`genindex`
19-
* :ref:`modindex`
20-
* :ref:`search`
103+
.. _`@kkroy36`: https://github.com/kkroy36/
104+
.. _`@batflyer`: https://github.com/batflyer/

docs/source/rnlp.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,21 @@ rnlp\.corpus module
1212
:undoc-members:
1313
:show-inheritance:
1414

15-
rnlp\.parseInputCorpus module
15+
rnlp\.parse module
1616
-----------------------------
1717

1818
.. automodule:: rnlp.parse
1919
:members:
2020
:undoc-members:
2121
:show-inheritance:
2222

23+
rnlp\.textprocessing module
24+
-----------------------------
25+
26+
.. automodule:: rnlp.textprocessing
27+
:members:
28+
:undoc-members:
29+
:show-inheritance:
2330

2431
Module contents
2532
---------------

0 commit comments

Comments
 (0)