-
Notifications
You must be signed in to change notification settings - Fork 3
Add page for Patapsco #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| <p> | ||
| <a href="https://github.com/hltcoe/patapsco">Patapsco</a> is a framework for running cross-language | ||
| infomration retrieval (CLIR) experiments developed by <a href="https://hltcoe.jhu.edu/">Human Language | ||
| Technlogy Center of Excellence (HLTCOE) at Johns Hopkins University. | ||
| </p> | ||
|
|
||
| <p> | ||
| To get started with Patapsco, see <a href="https://github.com/hltcoe/patapsco">this guide</a>. | ||
| </p> | ||
|
|
||
| <h2 class="underline">Basic Usage</h2> | ||
|
|
||
| <p> | ||
| Patapsco specifies the source of the collection via config files or config dictionaries in Python. | ||
| Please see <a href="https://github.com/hltcoe/patapsco/blob/master/samples/configs/irds_test.yml"> this | ||
| example config file</a> for reference. | ||
| </p> | ||
|
|
||
| <p> | ||
| For both <kbd>documents</kbd>, <kbd>topics</kbd> and <kbd>scores</kbd> sections, use <kbd>irds</kbd> | ||
| as the <kbd>format</kbd> in the <kbd>input</kbd> subsection to tell Pataspco to use <kbd>ir_datasets</kbd> | ||
| and specify the dataset name at <kbd>path</kbd>. The <kbd>lang</kbd> value has to match the language | ||
| information provided by <kbd>ir_datasets</kbd> through <kbd>dataset.docs_lang()</kbd> and | ||
| <kbd>dataset.queries_lang()</kbd>. Note that Patapsco uses the 3-letter ISO 639-3 language codes, | ||
| whereas ir_datasets provides two-letter ISO 639-1 language codes. | ||
| </p> | ||
|
|
||
| ${hl(''' | ||
| documents: | ||
| input: | ||
| format: irds | ||
| lang: zho | ||
| path: clirmatrix/zh/bi139-base/en/dev | ||
| process: | ||
| inherit: text | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the field from a document to use? |
||
| output: true | ||
|
|
||
| topics: | ||
| input: | ||
| format: irds | ||
| lang: eng | ||
| source: original | ||
| encoding: utf8 | ||
| path: clirmatrix/zh/bi139-base/en/dev | ||
|
|
||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The example file linked above also has a "queries" section. Is that optional? |
||
| score: | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I presume |
||
| input: | ||
| format: irds | ||
| path: clirmatrix/zh/bi139-base/en/dev | ||
| ''')} | ||
|
|
||
| <p> | ||
| This YAML config file can also be specified as a Python dictionary. Please refer to the | ||
| documentation of Pataspco for further information. | ||
| </p> | ||
|
|
||
| <table> | ||
| <tr><th>Patapsco's...</th><th>Corresponds to...</th><th>Notes</th></tr> | ||
| <tr><td><kbd>documents</kbd></td><td><kbd>docs</kbd></td><td></td></tr> | ||
| <tr><td><kbd>documents.input.path</kbd></td><td>dataset's ID</td><td></td></tr> | ||
| <tr><td><kbd>documents.input.lang</kbd></td><td><kbd>dataset.docs_lang()</kbd></td><td>Need to convert from ISO 639-1 to ISO 639-3</td></tr> | ||
| <tr><td><kbd>documents.process.inherit</kbd></td><td>the doc's field representing the text to use</td><td></td></tr> | ||
| <tr><td><kbd>topics</kbd></td><td><kbd>queries</kbd></td><td></td></tr> | ||
| <tr><td><kbd>topics.input.path</kbd></td><td>dataset's ID</td><td></td></tr> | ||
| <tr><td><kbd>topics.input.lang</kbd></td><td><kbd>dataset.queries_lang()</kbd></td><td>Need to convert from ISO 639-1 to ISO 639-3</td></tr> | ||
| <tr><td><kbd>score</kbd></td><td><kbd>qrels</kbd></td><td></td></tr> | ||
| <tr><td><kbd>score.input.path</kbd></td><td>dataset's ID</td><td></td></tr> | ||
| </table> | ||
|
|
||
| <h2 class="underline">Further Information</h2> | ||
|
|
||
| <ul> | ||
| <li><a href="https://github.com/hltcoe/patapsco">Patapsco GitHub</a></li> | ||
| </ul> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to automatically set
langfor the user from the lang provided by irds? (Or would this be counter to the design of Patapsco?)