-
Couldn't load subscription status.
- Fork 2.5k
ETag for static file server #2243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Programmed feature yhirose#2242. First of, the static file server now sends HTTP response header "ETag". Following HTTP requests by the client which include HTTP request header "If-None-Match" are only served if the value for HTTP response header "ETag" is not included in the value of HTTP request header "If-None-Match", otherwise an HTTP response with status code 304 is served which includes the HTTP response header "ETag" again that would have been sent with a normal status code of 200. Useful resources: - https://http.dev/caching - https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/ETag - https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/If-None-Match - https://www.rfc-editor.org/rfc/rfc9110.html#name-304-not-modified
|
Please check for style and do adjustements. Please adjust the location of the test. I just put it at the end and the names should be adjusted too. Does |
|
Test "ServerTest.StaticFileRangeBigFile" causes a segmentation fault. Do you have an idea for the reason? |
Added more comments. Check for method to be GET or HEAD. Explicitly clear the body.
|
The solution was done with one line. A c-string did not have null termination. 100% tests passed, 0 tests failed out of 444
Total Test time (real) = 365.31 sec
The following tests did not run:
114 - BindServerTest.BindDualStack (Disabled)
390 - SSLClientServerTest.LargeDataTransfer (Disabled) |
|
The function This is just a thought and has no effect on on this pull request. Only the tests would be needed to be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgfm. But I suggest you to move etag generation algo from handle_file_request and add func like add_etag_enable() what set the algo. class Server (its user) should know about the extension. Explicit is better than implicit.
And second moment. On 7924 line you check etag.length() >= 2 and use then .at(0) and .at(1) that check bounds: .length() > 0 and .length() > 1, it is useless :) (ref: https://en.cppreference.com/w/cpp/string/basic_string/at.html).
These remarks do not affect logic, it is correct. And your work is great.
|
Thank you for the review. I am glad that this feature is well perceived. Regarding your first recommendation, do you suggest a switch (member variable on class From my current knowledge, One reason for disabling might be due to performance. Do I even understand you correctly? What do you think? Regarding your second recommendation, I know that I think the condition needs to be like before Normally, I would use |
|
About manual enabling etag support, I think |
Server, new member variable: is_etag_enabled Server, new member function: get_is_etag_enabled() Server, new member function: set_is_etag_enabled() Enable ETag, If-Match and If-None-Match conditionally depending on is_etag_enabled. Programme If-Match.
|
Done. Please review. All tests pass either with or without macro While the implementation is fine, please take a look at minor things I just assumed and guessed.
Adjust to have consistency. These are things maintainers need to do. Preferably, add your own commits. |
|
@Spixmaster thanks for the contribution. Since it's a small pull request, I have to review it carefully. Before I start, I just let the 'Claude Sonnet 4.5' to review this pull request. Here is the report. Could you take a look at it and make necessary changes? Thanks! PR #2243 Detailed Investigation Report📋 SummaryPR Number: #2243 🎯 PurposeImplement HTTP ETag functionality for the static file server to improve client caching efficiency. 📝 Detailed Changes1. Added APIsPublic APIclass Server {
public:
bool get_is_etag_enabled() const;
Server &set_is_etag_enabled(const bool &enabled);
private:
std::atomic<bool> is_etag_enabled{false}; // Default: disabled
};2. ETag Implementation DetailsGeneration Method
Implementation Code (httplib.h lines 7895-7973)#ifdef CPPHTTPLIB_OPENSSL_SUPPORT
if (is_etag_enabled)
if (req.method == "GET" || req.method == "HEAD") {
// Copy entire file to std::string
const std::string file_data(mm->data(), mm->size());
const std::string etag = R"(")" + detail::SHA_512(file_data) + R"(")";
res.set_header("ETag", etag);
// Process If-Match / If-None-Match headers
// ...
}
#endif3. Supported HTTP HeadersIf-Match
If-None-Match
4. Test CoverageAdded Tests
Test Results: All 443 tests passed ✅ 🐛 Discovered Issues🔴 Critical: Operator Precedence BugProblematic Code (httplib.h lines 7964-7966)} else if (req.has_header("If-None-Match") &&
etags.find("*") != etags.cend() ||
etags.find(etag) != etags.cend()) {ProblemDue to operator precedence, this is interpreted as: (req.has_header("If-None-Match") && etags.find("*") != etags.cend())
||
etags.find(etag) != etags.cend()ImpactEven when If-None-Match header is not present, if there's a matching ETag in the ProofValidation program Correct Implementation} else if (req.has_header("If-None-Match") &&
(etags.find("*") != etags.cend() ||
etags.find(etag) != etags.cend())) {Fix Priority: 🔴 MANDATORY - Must be fixed before merge
|
| File Size | Processing Time | Throughput | Requests/sec |
|---|---|---|---|
| 1 KB | 0.01 ms | 163 MB/s | 166,945 req/s |
| 10 KB | 0.01 ms | 774 MB/s | 79,239 req/s |
| 100 KB | 0.11 ms | 850 MB/s | 8,700 req/s |
| 1 MB | 1.00 ms | 1,001 MB/s | 1,001 req/s |
| 10 MB | 7.92 ms | 1,263 MB/s | 126 req/s |
| 50 MB | 38.44 ms | 1,301 MB/s | 26 req/s |
| 100 MB | 82.01 ms | 1,219 MB/s | 12 req/s |
Problems
-
Computed on Every Request
- No caching mechanism
- Multiple requests for the same file recompute SHA-512 every time
-
Increased Memory Usage
- mmapped file + std::string copy = 2x memory usage
- 100MB file → 200MB+ memory consumption
-
DoS Attack Risk
- Concurrent requests to large files can make server unresponsive
- CPU usage reaches 100%
- Memory exhaustion
-
Real-world Impact Examples
- Serving 50MB PDF files → 38ms overhead per request
- 10 concurrent requests → 380ms latency
- Unusable in high-traffic environments
Issue 2: Comparison with Other Implementations
Nginx Implementation
ETag: "{last_modified_time_hex}-{file_size_hex}"
- Computation time: microseconds
- Memory usage: tens of bytes
- No file content reading required
Apache Implementation (default)
ETag: "{inode}-{size}-{mtime}"
- Computation time: nanoseconds
- Memory usage: minimal
Recommended Alternatives
Option 1: Metadata-based ETag (Recommended) ✅
// Use only file metadata
std::string generate_etag_from_metadata(const std::string& path, const FileStat& stat) {
std::ostringstream oss;
oss << std::hex << stat.mtime << "-" << stat.size;
return "\"" + oss.str() + "\"";
}Advantages:
- Computation time: microseconds
- Memory usage: tens of bytes
- Compatible with Nginx/Apache
- Reliably detects file content changes
Disadvantages:
- If file content is the same but metadata changes, ETag also changes (rare case)
Option 2: Hash Caching
struct ETagCacheEntry {
std::string etag;
time_t mtime;
size_t size;
};
std::unordered_map<std::string, ETagCacheEntry> etag_cache_;Advantages:
- Fast for repeated access to same file
- Content-based accurate ETag
Disadvantages:
- Cache management complexity
- Increased memory usage
- First request is still slow
Option 3: Size Threshold
const size_t ETAG_HASH_SIZE_LIMIT = 1 * 1024 * 1024; // 1MB
if (file_size < ETAG_HASH_SIZE_LIMIT) {
// SHA-512 hash
} else {
// Metadata-based
}🔍 Other Issues
Issue 3: Missing Whitespace Handling
RFC 9110 Specification
If-None-Match: "etag1" , "etag2" , "etag3"
^^^^^^^^ Whitespace allowed
Current Implementation
detail::split(header_value.c_str(), ..., ',', [&](const char *b, const char *e) {
std::string etag(b, e); // No trimming
// ...
});Impact: ETag values containing spaces may not be compared correctly (low risk)
Recommended Fix
std::string etag(b, e);
// Trim leading and trailing whitespace
etag.erase(0, etag.find_first_not_of(" \t"));
etag.erase(etag.find_last_not_of(" \t") + 1);Issue 4: Race Condition Possibility (Low Risk)
File could be modified by another process after being mmapped but before ETag calculation.
Impact: Inconsistent ETag generation (rare case)
Mitigation: Not a problem in normal operation
✅ Correctly Implemented Features
1. HTTP Specification Compliance (RFC 9110)
- Status code usage: 200, 304, 412 ✅
- ETag quotation handling ✅
- If-Match/If-None-Match semantics ✅
2. Weak ETag Handling
if (req.has_header("If-None-Match") &&
etag.length() >= 2 && etag.at(0) == 'W' && etag.at(1) == '/') {
etag.erase(0, 2); // Remove W/
}- RFC compliant: If-None-Match doesn't distinguish Weak/Strong ✅
- If-Match distinguishes Weak (correct) ✅
3. Wildcard Handling
*→ matches everything ✅- Comma-separated multiple ETag values supported ✅
4. Opt-in Design
- Disabled by default ✅
- User must explicitly enable ✅
5. Scope Limitation
- GET/HEAD methods only ✅
- Static file serving only ✅
- OpenSSL dependency explicit ✅
🔒 Security Assessment
Vulnerability Analysis
✅ No Critical Vulnerabilities
- Injection attacks: None
- Path traversal: None
- Memory corruption: None
- Information disclosure: None
⚠️ DoS Attack Risk (Medium)
Scenario:
- Attacker sends concurrent requests to large files (e.g., 100MB videos)
- Each request computes SHA-512 → CPU 100%
- Memory usage spikes
- Service unavailable to legitimate users
Mitigation:
- Introduce file size limits
- Rate limiting
- Adopt metadata-based ETags
Timing Attack Resistance
SHA-512 is a cryptographic hash function and resistant to timing attacks ✅
📊 Overall Assessment
Functionality
- HTTP Specification Compliance: ✅ Excellent
- Implementation Accuracy:
⚠️ Has bugs (needs fixing) - Test Coverage: ✅ Good
Performance
- Small Files (<1MB): ✅ Acceptable
- Medium Files (1-10MB):
⚠️ Needs attention - Large Files (>10MB): ❌ Unusable
Security
- Vulnerabilities: ✅ None
- DoS Resistance:
⚠️ Weak
Code Quality
- Readability: ✅ Good
- Maintainability: ✅ Good
- Documentation:
⚠️ Missing performance characteristics
🎯 Merge Decision
Current Status: ❌ NOT MERGEABLE
Reasons:
- 🔴 Critical Bug: Operator precedence bug (mandatory fix)
- 🟡 Performance: Serious performance issues with large files
Merge Conditions
Mandatory Fixes (Merge Blockers)
-
✅ Fix If-None-Match operator precedence bug
} else if (req.has_header("If-None-Match") && (etags.find("*") != etags.cend() || etags.find(etag) != etags.cend())) {
-
⚠️ Change ETag generation method or add documentation- Option A: Change to metadata-based (recommended)
- Option B: Introduce file size limit (e.g., < 1MB only)
- Option C: Document limitations (minimum requirement)
Recommended Fixes
- Add whitespace trimming
- Add performance benchmarks
- Document limitations in README
📋 Fix Proposals
Proposal 1: Metadata-based ETag (Recommended)
#ifdef CPPHTTPLIB_OPENSSL_SUPPORT
if (is_etag_enabled)
if (req.method == "GET" || req.method == "HEAD") {
// Generate ETag from metadata (Nginx style)
std::ostringstream oss;
oss << "\"" << std::hex << stat.st_mtime << "-" << stat.st_size << "\"";
std::string etag = oss.str();
res.set_header("ETag", etag);
// ... If-Match/If-None-Match processing ...
}
#endifAdvantages:
- Computation time: O(1), microseconds
- Memory usage: minimal
- Usable for all file sizes
- Industry standard (Nginx compatible)
Proposal 2: Hybrid Approach
const size_t ETAG_HASH_SIZE_LIMIT = 1 * 1024 * 1024; // 1MB
std::string etag;
if (mm->size() < ETAG_HASH_SIZE_LIMIT) {
// Small files: content-based hash
const std::string file_data(mm->data(), mm->size());
etag = R"(")" + detail::SHA_512(file_data) + R"(")";
} else {
// Large files: metadata-based
std::ostringstream oss;
oss << "\"" << std::hex << stat.st_mtime << "-" << stat.st_size << "\"";
etag = oss.str();
}🔗 References
RFC Specifications
- RFC 9110 - HTTP Semantics
- RFC 9110 Section 8.8.3 - ETag
- RFC 9110 Section 13.1.1 - If-Match
- RFC 9110 Section 13.1.2 - If-None-Match
Related Issues
Validation Files
work/etag_bug_test.cpp- Proof of operator precedence bugwork/etag_performance_test.cpp- Performance benchmarkwork/pr_2243_analysis.md- Initial analysis report
📝 Conclusion
PR #2243 has the right direction for adding ETag functionality, but cannot be merged as-is for the following reasons:
🔴 Blockers
- If-None-Match operator precedence bug - Must be fixed
- Performance issues with large files - Critical practical limitation
Recommended Actions
For Developer (Spixmaster)
- Fix If-None-Match bug
- Change ETag generation method to one of:
- Metadata-based (recommended)
- Size-limited hashing
- Hybrid approach
- Add performance tests
For Reviewer (yhirose)
- Verify operator precedence bug
- Evaluate performance characteristics
- Decide policy for ETag generation method
Warning for Users
If the current PR is merged:
- Small files (<1MB): Safe to use
- Large files (>10MB): Avoid (performance degradation)
Rating:
Last Updated: October 25, 2025
Fix operator precedence issue. Made httplib::detail::Filestat::st_ public as it is needed. Changed ETag constitution. Adjusted tests. Resources: https://stackoverflow.com/questions/40504281/c-how-to-check-the-last-modified-time-of-a-file https://serverfault.com/questions/690341/algorithm-behind-nginx-etag-generation
|
The result from Claude is good. Please adjust for consistency. I had to make Discovered Issues: issue 1Proper issue. Fixed it. Serious Performance Issues: issue 1I did not verify but it is logical. I changed the ETag constitution to the one from Nginx which I verified with https://stackoverflow.com/questions/40504281/c-how-to-check-the-last-modified-time-of-a-file and https://serverfault.com/questions/690341/algorithm-behind-nginx-etag-generation. Serious Performance Issues: issue 2Just listing existing constitutions. No real issue. Serious Performance Issues: issue 3Fixed with second commit, dc30799. Although I do not see, that whitspace is allowed in https://www.rfc-editor.org/rfc/rfc9110.html#name-etag. Serious Performance Issues: issue 4Not solvable and not a huge issue. |
Erase whitespace. Adjusted tests.
|
In summary, take a special look at the following and adjust for consistency.
|
|
@Spixmaster this is from a review from Copilot. Summary
Problems, reasons, and concrete fixes1) Use
|
Added last_modified() to struct FileStat. Removed condition #ifdef CPPHTTPLIB_OPENSSL_SUPPORT around ETag generation. Adjusted tests.
|
See 60d2a37. Issue 1Not needed, https://stackoverflow.com/questions/18850752/must-i-call-atomic-load-store-explicitly. Personally, I prefer not to call Issue 2Fixed. Relict from old ETag generation. Issue 3Fixed. Issue 4Fixed. Relict from od ETag generation. Issue 5I oppose this. Do it according to your liking. Issue 6The current behaviour is correct. The tests suffice. |
|
With update In summary, take a special look at the following and adjust for consistency.
|
Applied clang-tidy suggestions.
Programmed feature #2242. First off, the static file server now sends HTTP response header "ETag". Following HTTP requests by the client which include HTTP request header "If-None-Match" are only served if the value for HTTP response header "ETag" is not included in the value of HTTP request header "If-None-Match", otherwise an HTTP response with status code 304 is served which includes the HTTP response header "ETag" again that would have been sent with a normal status code of 200.
Useful resources: