NousResearch · Ridwannurudeen · Feb 1, 2026
diff --git a/tasks.md b/tasks.md
@@ -14,6 +14,7 @@
 | [Categorizing Information Into Hierarchies](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/categorizing-information-into-hierarchies.md) | Organizing information into hierarchical structures based on relationships or logical criteria. |
 | [Causal Chain Analysis](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/causal-chain-analysis.md) | Identifying and analyzing a sequence of events or factors that lead to a specific outcome. |
 | [Completing Analogies](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/completing-analogies.md) | Completing analogies to evaluate the ability to recognize relationships between pairs of words and apply them to new contexts. |
+| [Consensus Mechanism Analysis](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/consensus-mechanism-analysis.md) | Comparing, evaluating, and reasoning about distributed consensus mechanisms, analyzing tradeoffs between security, decentralization, and scalability, and assessing game-theoretic incentive structures in blockchain systems. |
 | [Constructing Valid Arguments](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/constructing-valid-arguments.md) | Creating logically sound arguments to support a given conclusion or claim. |
 | [Counterfactual Analysis](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/counterfactual-analysis.md) | Examining hypothetical scenarios that are contrary to what actually happened. |
 | [Critical Factor Identification in Theory of Mind](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/critical-factor-identification-in-theory-of-mind.md) | Understanding that similar agents may act differently, or different agents may act similarly, based on a third factor. |
@@ -82,11 +83,13 @@
 | [Risk Assessment In Decision Making](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/risk-assessment-in-decision-making.md) | Evaluating potential risks and benefits associated with different courses of action in a given scenario. |
 | [Second Order False Belief](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/second-order-false-belief.md) | Understanding that an agent may hold a false belief about another agent's belief, leading to misinterpretations of actions or intentions. |
 | [Sequencing Events](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/sequencing-events.md) | Arranging a set of events in chronological order. |
+| [Smart Contract Logic Reasoning](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/smart-contract-logic-reasoning.md) | Analyzing smart contract code, rules, or specifications to predict execution outcomes, identify potential vulnerabilities, and reason about state transitions in deterministic, blockchain-based systems. |
 | [Solving Riddles And Word Puzzles](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/solving-riddles-and-word-puzzles.md) | Deciphering and solving various types of riddles and word puzzles. |
 | [Solving Word Problems With Multiple Variables](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/solving-word-problems-with-multiple-variables.md) | Translating word problems into mathematical equations and solving them. |
 | [Spatial Problem Solving](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/spatial-problem-solving.md) | Using spatial reasoning to solve practical problems or puzzles. |
 | [Stack Based Reasoning] (https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/stack-based-reasoning) Using stacks and reverse thinking to convert chain of thought into stacks of instructions for others to follow|
 | [Syllogism Reasoning](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/syllogism-reasoning.md) | Providing a series of syllogisms to evaluate logical reasoning capabilities. |
+| [Tokenomics and Incentive Design Reasoning](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/tokenomics-and-incentive-design-reasoning.md) | Analyzing economic models, reward mechanisms, and incentive structures in token-based systems, including supply/demand dynamics, attack vectors, and long-term sustainability of decentralized economic models. |
 | [Trait Attribution In Behavioral Scenarios](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/trait-attribution-in-behavioral-scenarios.md) | Analyzing described behaviors or actions and inferring personality traits or characteristics that might explain those behaviors. |
 | [Trend Analysis And Forecasting](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/trend-analysis-and-forecasting.md) | Examining historical data or patterns to identify trends and make predictions about future outcomes. |
 | [Truth Table Completion](https://github.com/NousResearch/Open-Reasoning-Tasks/blob/main/tasks/truth-table-completion.md) | Completing truth tables for given logical expressions to evaluate the understanding of Boolean logic and ability to determine the truth value of complex statements. |

diff --git a/tasks/consensus-mechanism-analysis.md b/tasks/consensus-mechanism-analysis.md
@@ -0,0 +1,234 @@
+# Consensus Mechanism Analysis
+
+## Description:
+This task involves comparing, evaluating, and reasoning about distributed consensus mechanisms used in blockchain systems and other decentralized networks. It evaluates the model's ability to analyze tradeoffs between security, decentralization, and scalability, reason about Byzantine fault tolerance, identify attack vectors, and assess game-theoretic incentive structures that keep distributed systems honest.
+
+## Modality:
+Text only
+
+## Examples:
+
+### Example 1:
+
+Input:
+
+```
+A new blockchain project claims their consensus mechanism achieves all three properties simultaneously:
+1. Full decentralization (thousands of equal validators)
+2. Instant finality (transactions confirmed in under 1 second)
+3. Strong security (tolerates up to 49% malicious validators)
+
+Analyze whether this claim is feasible given known theoretical constraints.
+```
+
+Output:
+
+```
+This claim faces fundamental theoretical limitations:
+
+1. The CAP Theorem and blockchain trilemma:
+   - Decentralization + Security + Scalability cannot all be maximized simultaneously
+   - "Instant finality" with thousands of validators creates a communication bottleneck
+
+2. Communication overhead analysis:
+   - BFT-style consensus (needed for instant finality) requires O(n²) message complexity
+   - With thousands of validators, this means millions of messages per consensus round
+   - At 1,000 validators: ~1,000,000 messages per block
+   - Network latency alone makes sub-second finality impractical at this scale
+
+3. The 49% fault tolerance claim:
+   - Classical BFT protocols (PBFT) tolerate up to 33% Byzantine faults, not 49%
+   - Achieving 49% tolerance typically requires additional assumptions (like synchronous networks)
+   - Nakamoto consensus tolerates 49% but sacrifices instant finality (probabilistic instead)
+
+4. Known tradeoffs in real systems:
+   - Tendermint: Instant finality + 33% BFT, but limits validators (~150-200)
+   - Ethereum PoS: Large validator set + security, but finality takes ~15 minutes
+   - Solana: Fast finality + many validators, but requires high-performance hardware (reduces true decentralization)
+
+Conclusion: The claim is almost certainly infeasible. The project likely makes hidden compromises—either the validators aren't truly equal (delegated/weighted), finality has probabilistic caveats, or the security threshold is lower than stated. Any two of the three properties can be achieved, but all three simultaneously contradicts known distributed systems theory.
+```
+
+---
+
+### Example 2:
+
+Input:
+
+```
+Consider two blockchain networks:
+
+Network A: Proof of Work
+- Block time: 10 minutes
+- 10,000 miners worldwide
+- Top 3 mining pools control 51% of hashrate
+- Energy consumption: 100 TWh/year
+
+Network B: Delegated Proof of Stake
+- Block time: 2 seconds
+- 21 elected block producers
+- Token holders vote for producers
+- Energy consumption: 0.01 TWh/year
+
+Which network is more decentralized? Consider multiple dimensions of decentralization.
+```
+
+Output:
+
+```
+Decentralization is multi-dimensional, and these networks make different tradeoffs:
+
+Dimension 1 - Validator/Miner Distribution:
+   - Network A: 10,000 miners, but 3 pools control majority → high nominal count, concentrated actual power
+   - Network B: Only 21 producers → low count, but each is explicitly elected
+   - Edge: Network A has more participants, but effective decentralization is similar due to pool concentration
+
+Dimension 2 - Barrier to Entry:
+   - Network A: Requires expensive, specialized hardware (ASICs) + cheap electricity → high capital barrier
+   - Network B: Requires community votes, not capital investment in hardware → political barrier instead
+   - Edge: Different barrier types. Network A excludes the poor, Network B excludes the unpopular
+
+Dimension 3 - Censorship Resistance:
+   - Network A: A censored transaction can be included by any of the ~10,000 miners eventually
+   - Network B: If a majority of the 21 producers collude, they can censor indefinitely
+   - Edge: Network A. More independent block producers means more paths to inclusion
+
+Dimension 4 - Governance:
+   - Network A: Changes require social consensus among miners, developers, and users (hard forks are contentious)
+   - Network B: 21 producers can coordinate changes quickly, but voters can replace them
+   - Edge: Network A has slower but more distributed governance
+
+Dimension 5 - Geographic Distribution:
+   - Network A: Mining concentrates where electricity is cheapest (often one or two countries)
+   - Network B: Producers can operate from anywhere (low energy needs)
+   - Edge: Network B potentially more geographically distributed
+
+Overall assessment: Network A is more decentralized in censorship resistance and governance, while Network B offers better geographic distribution and lower barriers to participation. Neither is definitively "more decentralized"—the answer depends on which dimensions you prioritize for your use case.
+```
+
+---
+
+### Example 3:
+
+Input:
+
+```
+In a Proof of Stake system, validators must lock (stake) 32 ETH to participate. The protocol randomly selects validators to propose blocks and assigns committees to attest. Validators earn rewards for correct behavior and lose stake (get "slashed") for misbehavior.
+
+A wealthy attacker acquires 34% of all staked ETH. What attacks become possible, what remains impossible, and what economic consequences does the attacker face?
+```
+
+Output:
+
+```
+With 34% of stake, the attacker crosses the critical 33% threshold for BFT systems:
+
+POSSIBLE ATTACKS:
+
+1. Liveness attack (halting finality):
+   - Ethereum's Casper FFG requires 2/3 (66.7%) of validators to agree for finality
+   - The attacker can withhold attestations, preventing the 66.7% threshold
+   - Result: The chain continues producing blocks, but no epoch reaches finality
+   - Impact: Transactions process but lack the strong guarantee of irreversibility
+
+2. Censorship by committee:
+   - The attacker has ~34% chance of dominating any given attestation committee
+   - Can selectively refuse to attest to blocks containing specific transactions
+   - Effectiveness: Partial. Transactions may be delayed but will likely be included eventually when honest validators dominate the committee
+
+IMPOSSIBLE ATTACKS (even with 34%):
+
+1. Double spending / finality reversion:
+   - Requires controlling 2/3+ of stake (66.7%)
+   - With 34%, the attacker cannot create conflicting finalized checkpoints
+
+2. Creating invalid blocks:
+   - Other validators independently verify block validity
+   - Invalid blocks are rejected regardless of the proposer's stake
+
+3. Stealing other validators' funds:
+   - Protocol rules are enforced by all nodes, not by majority vote
+
+ECONOMIC CONSEQUENCES:
+
+1. Slashing for the liveness attack:
+   - If the attacker's validators go offline to prevent finality, they face inactivity leak
+   - Penalties escalate quadratically over time
+   - After ~2-3 weeks of non-finality, the attacker's stake would be significantly reduced
+   - Eventually, their share drops below 33%, restoring finality
+
+2. Cost of attack:
+   - 34% of staked ETH is tens of billions of dollars
+   - Slashing could destroy a significant portion of this
+   - The attack also devalues ETH, destroying the attacker's remaining holdings
+
+3. Social layer defense:
+   - The community could coordinate a hard fork to slash the attacker's stake entirely
+   - This "nuclear option" makes the attack economically suicidal
+
+Key insight: Proof of Stake is designed so that attacks are economically self-defeating. The quadratic inactivity leak specifically targets the 33%+ liveness attack, gradually restoring finality even if the attacker persists.
+```
+
+---
+
+### Example 4:
+
+Input:
+
+```
+A blockchain uses an AI-augmented consensus mechanism called "Optimistic Democracy" where:
+- Multiple AI validators independently process the same input
+- If all validators agree, the result is accepted immediately
+- If there is disagreement, a more expensive "full consensus" round is triggered
+- Validators are rewarded for agreeing with the final consensus and penalized for disagreement
+
+What are the game-theoretic implications of this design? Identify at least one incentive misalignment.
+```
+
+Output:
+
+```
+Game-theoretic analysis of Optimistic Democracy:
+
+INTENDED INCENTIVES:
+- Validators are motivated to be accurate (correct results lead to agreement and rewards)
+- The optimistic path (immediate acceptance) makes the common case efficient
+- The penalty for disagreement discourages random or lazy validation
+
+INCENTIVE MISALIGNMENT #1 - Herding Behavior:
+   - Validators are rewarded for AGREEING, not for being CORRECT
+   - If a validator suspects it has the minority opinion, it's rational to change its answer to match the expected majority
+   - This creates pressure toward conformity rather than independent verification
+   - Worst case: All validators converge on the same wrong answer because each fears being the dissenter
+   - This mirrors the "Keynesian beauty contest" problem in economics
+
+INCENTIVE MISALIGNMENT #2 - Discouraging Dissent:
+   - Triggering full consensus is costly for the network
+   - If a validator detects an error but knows dissenting will trigger the expensive path, it faces a dilemma:
+     - Dissent: Correct behavior, but risks penalty if other validators don't agree
+     - Stay silent: Incorrect behavior, but guaranteed reward
+   - Rational validators may let small errors pass to avoid the cost of full consensus
+
+INCENTIVE MISALIGNMENT #3 - Copycat Strategies:
+   - If one validator is known to be highly accurate, others might simply copy its outputs
+   - This reduces the system to effectively having one validator, destroying the security benefit of multiple independent checks
+   - Detection is difficult if validators can observe each other's preliminary results
+
+POSSIBLE FIXES:
+1. Commit-reveal scheme: Validators commit to answers before seeing others' responses
+2. Reward dissent: Provide a bonus for validators who correctly identify errors (even if they're in the minority)
+3. Random audits: Occasionally inject known-answer challenges to verify independent processing
+4. Asymmetric penalties: Make the penalty for agreeing on a wrong answer worse than the penalty for lone dissent
+
+Key insight: Any consensus mechanism that rewards agreement must carefully balance this against the need for independent verification. The "optimistic" assumption works well when errors are rare, but the incentive structure must ensure that errors are caught when they do occur.
+```
+
+## Tags:
+- Consensus Mechanisms
+- Distributed Systems
+- Byzantine Fault Tolerance
+- Game Theory
+- Blockchain Security
+- Decentralization Analysis
+- Incentive Design
+- Synthetic