It'd be really nice if we could give an example of pairwise analysis, as pairwise analysis is much more intuitive with respect to getting an idea of where one system is better than the other. We could use the CNN and LSTM output here: https://github.com/neulab/ExplainaBoard/tree/main/data/system_outputs/sst2
Originally posted by @neubig in #563 (comment)