ARC Dataset and Scripts Repository

This repository contains the ARC dataset, including both the original human-annotated version and extended versions created using Large Language Models (LLMs).

Data Availability Notice

As of now, only 10% of the complete dataset has been made public.

We plan to release the full dataset, including all human annotations and LLM-generated extensions, contingent upon the acceptance of our accompanying paper at LREC.

Repository Structure

`ARC_dataset/`

This folder houses the 10% sample of the core datasets. It is organized into three sub-folders, with each containing train.json, dev.json, and test.json files:

81_ARC_human_annotated: A sample of the original, human-annotated dataset.
188_ARC_moderate_extension: A sample of the moderately extended version of the dataset.
444_ARC_full_extension: A sample of the complete, fully extended dataset.

Script and Prompt Folders

TANL/ & DENIM/: These folders contain the scripts that were run on the datasets to generate performance metrics.
Prompts/: This folder contains the specific prompts that were used with LLMs to extend the original dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARC Dataset and Scripts Repository

Data Availability Notice

Repository Structure

`ARC_dataset/`

Script and Prompt Folders

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ARC_dataset		ARC_dataset
DENIM		DENIM
Prompts		Prompts
TANL		TANL
README.md		README.md

Redluigi1/ARC_reports

Folders and files

Latest commit

History

Repository files navigation

ARC Dataset and Scripts Repository

Data Availability Notice

Repository Structure

ARC_dataset/

Script and Prompt Folders

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ARC_dataset/`

Packages