Question quality is poor

There are several recurring issues across different questions. They ALL need to be addressed via a careful audit of *EVERY* question:

| Problem | Example or elaboration | Consequence | Proposed solution |
| -------- | ----------------------- | -------------- | ------------------ |
| Question text is too long and/or complex | Some questions include extraneous detail (outside of the scope of the to-be-tested concept) that makes parsing more difficult without improving signal | This leads to testing *reading comprehension* instead of the desired *conceptual content* | Audit long questions and simplify/reword to focus on the desired concept. Keep questions *short*, *simple*, and *direct* |
| Distractors vary along non-critical dimensions | For example, there's a question about the Terracotta Army (correct answer). Several distractors list "Terracotta Army" followed by additional extraneous text that goes beyond the scope of the initial question (e.g., "from the funerary temple ..." vs "from the burial complex ..." vs "from the mausoleum of ..."). | This ends up focusing the test on those minor details instead of the *core* concept. | Reword questions and responses so that the "answers" and "distractors" are very short (1--3ish words) |
| Answers can be determined from context without actually having expertise in the target area | Question: "What hardstone material, mined and carved in China since the Neolithic...".  “jade” appears in 3 options so it must be jade. “gemstone” appears in 3 options so it must be gemstone. “virtue and purity” appears in 3 options so it must be that. B is the option that contains all of those, so the answer must be B. You can apply this logic to ~3/4 of the questions | This reduces the utility and signal provided by the questions (about knowledge), since correct responses end up reflecting ability to pattern match more than expertise or knowledge. | Carefully audit all questions to determine whether the content of EITHER the *question* or *response options* provides sufficient information *in and of themselves* to be able to easily guess the answer without actually having expertise in the tested area |

Suggested approach:
1. Create a skill to audit and improve questions for a given domain (follow general approach of generate-questions skill)
2. For *each* question in the given domain, audit carefully for the above issues and return a re-worded question + responses
3. Do this across multiple passes:
  - Pass 1: flag which issues in the table are present and re-word
  - Pass 2: re-audit for *all* issues in the table. continue alternating between auditing + fixing until the question passes all audits.
4. Then update the question.
5. After all questions have been updated, we will need to re-embed all questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question quality is poor #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problem	Example or elaboration	Consequence	Proposed solution
Question text is too long and/or complex	Some questions include extraneous detail (outside of the scope of the to-be-tested concept) that makes parsing more difficult without improving signal	This leads to testing reading comprehension instead of the desired conceptual content	Audit long questions and simplify/reword to focus on the desired concept. Keep questions short, simple, and direct
Distractors vary along non-critical dimensions	For example, there's a question about the Terracotta Army (correct answer). Several distractors list "Terracotta Army" followed by additional extraneous text that goes beyond the scope of the initial question (e.g., "from the funerary temple ..." vs "from the burial complex ..." vs "from the mausoleum of ...").	This ends up focusing the test on those minor details instead of the core concept.	Reword questions and responses so that the "answers" and "distractors" are very short (1--3ish words)
Answers can be determined from context without actually having expertise in the target area	Question: "What hardstone material, mined and carved in China since the Neolithic...". “jade” appears in 3 options so it must be jade. “gemstone” appears in 3 options so it must be gemstone. “virtue and purity” appears in 3 options so it must be that. B is the option that contains all of those, so the answer must be B. You can apply this logic to ~3/4 of the questions	This reduces the utility and signal provided by the questions (about knowledge), since correct responses end up reflecting ability to pattern match more than expertise or knowledge.	Carefully audit all questions to determine whether the content of EITHER the question or response options provides sufficient information in and of themselves to be able to easily guess the answer without actually having expertise in the tested area

Question quality is poor #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions