Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,7 @@ def generate_index(out_dir, version, top_level_map):
<li><a href="https://arxiv.org/pdf/2103.02280.pdf">ir_datasets SIGIR resource paper</a></li>
<li>Using <kbd>ir_datasets</kbd> with&hellip;
<a href="pyterrier.html">PyTerrier</a> &middot;
<a href="patapsco.html">Patapsco</a> &middot;
<a href="ir-measures.html">ir-measures</a> &middot;
<a href="trec_eval.html">trec_eval</a> &middot;
<a href="experimaestro.html">Experimaestro</a>
Expand Down Expand Up @@ -615,6 +616,9 @@ def hlb(c):
template = Template(filename=os.path.join("templates", "pyterrier.html"))
with page_template('pyterrier.html', out_dir, version, title='PyTerrier &amp; ir_datasets', include_irds_title=False) as out:
out.write(template.render(hl=hl))
template = Template(filename=os.path.join("templates", "patapsco.html"))
with page_template('patapsco.html', out_dir, version, title='Patapsco &amp; ir_datasets', include_irds_title=False) as out:
out.write(template.render(hl=hl))
template = Template(filename=os.path.join("templates", "ir-measures.html"))
with page_template('ir-measures.html', out_dir, version, title='ir_measures &amp; ir_datasets', include_irds_title=False) as out:
out.write(template.render(hl=hl, hlb=hlb))
Expand Down
74 changes: 74 additions & 0 deletions templates/patapsco.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
<p>
<a href="https://github.com/hltcoe/patapsco">Patapsco</a> is a framework for running cross-language
infomration retrieval (CLIR) experiments developed by <a href="https://hltcoe.jhu.edu/">Human Language
Technlogy Center of Excellence (HLTCOE) at Johns Hopkins University.
</p>

<p>
To get started with Patapsco, see <a href="https://github.com/hltcoe/patapsco">this guide</a>.
</p>

<h2 class="underline">Basic Usage</h2>

<p>
Patapsco specifies the source of the collection via config files or config dictionaries in Python.
Please see <a href="https://github.com/hltcoe/patapsco/blob/master/samples/configs/irds_test.yml"> this
example config file</a> for reference.
</p>

<p>
For both <kbd>documents</kbd>, <kbd>topics</kbd> and <kbd>scores</kbd> sections, use <kbd>irds</kbd>
as the <kbd>format</kbd> in the <kbd>input</kbd> subsection to tell Pataspco to use <kbd>ir_datasets</kbd>
and specify the dataset name at <kbd>path</kbd>. The <kbd>lang</kbd> value has to match the language
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to automatically set lang for the user from the lang provided by irds? (Or would this be counter to the design of Patapsco?)

information provided by <kbd>ir_datasets</kbd> through <kbd>dataset.docs_lang()</kbd> and
<kbd>dataset.queries_lang()</kbd>. Note that Patapsco uses the 3-letter ISO 639-3 language codes,
whereas ir_datasets provides two-letter ISO 639-1 language codes.
</p>

${hl('''
documents:
input:
format: irds
lang: zho
path: clirmatrix/zh/bi139-base/en/dev
process:
inherit: text
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the field from a document to use?

output: true

topics:
input:
format: irds
lang: eng
source: original
encoding: utf8
path: clirmatrix/zh/bi139-base/en/dev

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example file linked above also has a "queries" section. Is that optional?

score:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume score corresponds to the qrels?

input:
format: irds
path: clirmatrix/zh/bi139-base/en/dev
''')}

<p>
This YAML config file can also be specified as a Python dictionary. Please refer to the
documentation of Pataspco for further information.
</p>

<table>
<tr><th>Patapsco's...</th><th>Corresponds to...</th><th>Notes</th></tr>
<tr><td><kbd>documents</kbd></td><td><kbd>docs</kbd></td><td></td></tr>
<tr><td><kbd>documents.input.path</kbd></td><td>dataset's ID</td><td></td></tr>
<tr><td><kbd>documents.input.lang</kbd></td><td><kbd>dataset.docs_lang()</kbd></td><td>Need to convert from ISO 639-1 to ISO 639-3</td></tr>
<tr><td><kbd>documents.process.inherit</kbd></td><td>the doc's field representing the text to use</td><td></td></tr>
<tr><td><kbd>topics</kbd></td><td><kbd>queries</kbd></td><td></td></tr>
<tr><td><kbd>topics.input.path</kbd></td><td>dataset's ID</td><td></td></tr>
<tr><td><kbd>topics.input.lang</kbd></td><td><kbd>dataset.queries_lang()</kbd></td><td>Need to convert from ISO 639-1 to ISO 639-3</td></tr>
<tr><td><kbd>score</kbd></td><td><kbd>qrels</kbd></td><td></td></tr>
<tr><td><kbd>score.input.path</kbd></td><td>dataset's ID</td><td></td></tr>
</table>

<h2 class="underline">Further Information</h2>

<ul>
<li><a href="https://github.com/hltcoe/patapsco">Patapsco GitHub</a></li>
</ul>