Skip to content

Conversation

@ipalenov
Copy link
Collaborator

No description provided.

@ipalenov ipalenov requested a review from staswilf August 19, 2025 13:53
@codealive-github-integration
Copy link

🔍 Code Review

Hello ipalenov (@ipalenov)! Thank you for introducing batch message sending to PinkRabbitMQ. This is a valuable feature that promises significant performance improvements.

Overall, the change to enable batch publishing is a positive step towards optimizing message throughput. However, several critical areas related to input validation, error handling, and test coverage need attention to ensure the stability, security, and robustness of this new functionality.

🎯 Key Changes & Highlights

✨ **Batch Publishing Feature**

The introduction of BatchPublish allows for sending multiple messages in a single operation, which is a significant architectural improvement for performance-critical scenarios. This can drastically reduce network overhead and improve overall message throughput compared to sending individual messages.

🚨 Issues & Suggestions

🔐 DoS via Unvalidated JSON Input

Malformed or oversized JSON input can lead to resource exhaustion and Denial of Service.

Problem: The batchPublishImpl and headersFromJson functions directly parse user-supplied JSON without validating its size or structure. An attacker could send an excessively large or malformed JSON string, causing high CPU/memory consumption, parsing errors, and potentially a Denial of Service (DoS) for the component and dependent services.

Fix: Implement robust input validation and error handling. Before calling json::parse, check the length of the input string (ctx.stringParamUtf8() or propsJson). If it exceeds a reasonable maximum (e.g., 1-10 MB), reject the request. Additionally, wrap json::parse calls in try-catch blocks to gracefully handle nlohmann::json::parse_error exceptions, preventing crashes and providing informative error messages.

// Example for batchPublishImpl
try {
    const std::string& jsonInput = ctx.stringParamUtf8();
    if (jsonInput.length() > MAX_BATCH_JSON_SIZE) { // Define MAX_BATCH_JSON_SIZE
        ctx.setError(u16Converter.from_bytes("Batch messages JSON input too large."));
        return false;
    }
    auto messages = json::parse(jsonInput);
    // ... rest of the logic ...
} catch (const json::parse_error& e) {
    ctx.setError(u16Converter.from_bytes("JSON parsing error for messages: " + std::string(e.what())));
    return false;
}

// Example for headersFromJson
AMQP::Table RabbitMQClient::headersFromJson(const std::string& propsJson, bool forConsume)
{
    if (propsJson.empty()) {
        return AMQP::Table();
    }
    if (propsJson.length() > MAX_HEADERS_JSON_SIZE) { // Define MAX_HEADERS_JSON_SIZE
        // Handle error appropriately, e.g., log and return empty table or throw
        return AMQP::Table();
    }
    try {
        auto object = json::parse(propsJson);
        return headersFromJson(object, forConsume);
    } catch (const json::parse_error& e) {
        // Log error and potentially return an empty/invalid table
        return AMQP::Table();
    }
}

🏗️ Unsafe Batch Message Access

Direct access to JSON fields in batch messages can cause crashes if keys are missing.

Problem: In batchPublishImpl, it["routingKey"] and it["body"] are accessed directly without checking if these keys exist in each message object within the batch. If a message is malformed and lacks these mandatory fields, nlohmann::json will throw an exception, leading to a component crash and potential Denial of Service for the entire batch operation.

Fix: Before accessing routingKey and body, verify their existence using it.contains(). If a key is missing, handle the error gracefully by logging it, skipping the malformed message, or setting an error on the CallContext.

for (auto& it : messages) {
    if (!it.contains("routingKey") || !it.contains("body")) {
        ctx.setError(u16Converter.from_bytes("Batch message missing 'routingKey' or 'body'. Skipping message."));
        // Log the error with more context if possible
        continue; // Skip this malformed message and process others
    }
    std::string routingKey = it["routingKey"];
    std::string message = it["body"];
    // ... rest of the logic ...
}

✨ Incomplete Test Coverage

New batch publish functionality lacks comprehensive test coverage for edge cases.

Problem: The BatchPublish method introduces significant new functionality, but current tests primarily cover basic success scenarios. Critical edge cases and error paths, such as malformed JSON input, empty message arrays, messages missing mandatory fields, and large batches, are not adequately tested. This increases the risk of regressions and unexpected behavior in production.

Fix: Expand the test suite in test/test_amqp.py to include comprehensive tests for batchPublishImpl. Add test cases for:

  • Invalid JSON syntax for the messages array.
  • An empty messages array.
  • Messages within the batch missing routingKey or body.
  • Messages with malformed properties or headers JSON.
  • Testing the persistent=true parameter.
  • Testing very large batches to assess performance and memory implications.

🔐 Validate Routing Key

Add validation for routingKey format to prevent downstream issues.

Why: The routingKey is extracted directly from user-supplied JSON without specific validation. While RabbitMQ handles routing keys, if downstream consumers expect a specific format, a malformed key could lead to unexpected behavior or vulnerabilities in those systems. Adding validation improves overall system robustness and security.

How: Implement a isValidRoutingKey function based on expected patterns (e.g., alphanumeric characters, specific delimiters). If the component cannot enforce specific formats, document this responsibility for calling applications and consumers.

std::string routingKey = it["routingKey"];
if (!isValidRoutingKey(routingKey)) { // Implement isValidRoutingKey based on project standards
    ctx.setError(u16Converter.from_bytes("Invalid routing key format."));
    return false;
}

⚡ Use string::length()

Replace `strlen()` with `std::string::length()` for efficiency.

Why: Using strlen(message.c_str()) is an O(N) operation that iterates through the string to find the null terminator. For std::string, message.length() is an O(1) operation as the length is already stored. While the impact might be small for single messages, this optimization can accumulate when sending many messages in a batch, contributing to slightly higher throughput and improved CPU efficiency.

How: Update src/RabbitMQClient.cpp at lines 243 and 272 to use message.length():

// In src/RabbitMQClient.cpp, line 243 and 272
AMQP::Envelope envelope(message.c_str(), message.length());

⚡ Validate Performance Gains

Provide benchmarks to quantify BatchPublish performance benefits.

Why: The BatchPublish method is a performance optimization. Without explicit benchmarks or performance metrics, it's difficult to quantify the actual gains over sending individual messages in a loop. This data is crucial for validating the architectural decision, setting appropriate expectations, and guiding its adoption by other teams.

How: Conduct and document performance benchmarks comparing BatchPublish with BasicPublish in a loop. Measure key metrics such as throughput (messages/second), latency, and CPU/memory usage for various batch sizes and message complexities. Include this data in the PR description or a linked document.


🚦 Merge Recommendation

🚫 Request Changes - Critical issues require resolution before merging

Reasoning

The introduction of batch publishing is a valuable feature, but the identified high-severity issues related to Denial of Service vulnerabilities, unsafe JSON access, and insufficient test coverage pose significant risks to system stability and security. These must be addressed to ensure the robustness and reliability of the new functionality before this change can be merged.


🤖 This review was generated by CodeAlive AI

AI can make mistakes, so please verify all suggestions and use with caution. We value your feedback about this review - please contact us at support@codealive.ai.

💡 Tip: Comment /codealive review to retrigger reviews. You can also manage reviews at https://app.codealive.ai/pull-requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants