A lightweight HTTP/1.1 server written in modern C++20, compliant with the Hive/42 webserv project specifications.
Webserv is our first large-scale C++ project at Hive/42. The goal was to implement a lightweight, fully working HTTP/1.1 server from scratch, trying to use modern C++ standard libraries.
The server is designed to be RFC-compliant (following HTTP/1.1 [RFC 7230–7235]) and supports essential features such as:
- Parsing and validating configuration files (Nginx-style syntax).
- Handling GET, POST, DELETE with static files, autoindex, and uploads.
- Executing CGI scripts securely with proper environment and timeouts.
- Multiplexing sockets and CGI pipes in a single poll-based event loop.
- Graceful error handling, timeouts, and connection reuse (keep-alive).
For reference and correctness, Nginx was used as a behavioral benchmark: routing, error responses, and edge cases were compared against it to ensure realistic and compliant behavior.
This project was both a challenge in systems programming and a solid introduction to networking, concurrency, and protocol design in modern C++.
| Component | Responsibility |
|---|---|
Server |
Represents a virtual host configuration. Manages binding (host + port), server names, error pages, body size limits, and a collection of Location blocks. |
Location |
Encapsulates route-specific configuration. Defines path matching, allowed HTTP methods, root directories, index files, redirects, CGI interpreters, and upload stores. |
runWebserv |
Orchestrates the execution of the web server: initializes Server objects from the parsed configuration, launches sockets, and enters the event loop. |
- Parse CLI arguments or fallback to default configuration.
- Load and normalize configuration (via config parser).
- Validate
ServerandLocationobjects. - Call
runWebserv()to start the server runtime.
The program is designed so that configuration and validation are complete before runtime begins, ensuring that only consistent and safe server objects are passed to the execution loop.
This section describes how the configuration parsing logic of Webserv works, including the step‑by‑step pipeline and the rules applied during parsing and validation.
See Details
-
Component:
Tokenizer -
Goal: Convert raw configuration text into a structured list of tokens.
-
Steps:
-
Skip UTF‑8 BOM if present.
-
Ignore whitespace, line breaks, and comments (
# ...). -
Classify tokens into categories:
- Keywords:
server,location,listen,host,root,index,autoindex,methods,upload_store,return,error_page,client_max_body_size,cgi_extension. - Identifiers: Alphanumeric strings with
-,.,/,:allowed. - Numbers & Units: Digits with optional single‑letter suffix (
k,m,g). - Strings: Quoted values (single
'or double"). - Symbols:
{,},;,,.
- Keywords:
-
Detect and reject invalid characters, control characters, or malformed identifiers.
-
-
Component:
ConfigParser -
Goal: Transform token stream into structured objects (
Config,Server,Location). -
Rules:
-
Block structure: Curly braces
{ ... }delimitserverandlocationblocks. -
Directives: Each directive must end with
;unless it opens a block. -
Directive placement: Certain directives are only valid at specific levels:
- Server level:
listen,host,server_name,error_page,client_max_body_size. - Location level:
root,index,autoindex,methods,upload_store,return,cgi_extension,cgi_interpreter.
- Server level:
-
Nesting: Locations may not contain other
serverblocks.
-
-
Server: Represents a virtual host.
- Holds host, port, server names, error pages, body size limits, and
Locationblocks.
- Holds host, port, server names, error pages, body size limits, and
-
Location: Defines behavior for a URI path prefix.
- Includes root directory, index file(s), autoindex flag, allowed methods, redirects, CGI settings, and upload store.
-
After parsing, the configuration is normalized to ensure consistency and defaults:
- Missing
client_max_body_size→ default = 1 MB. - Missing
error_page→ add defaults for common errors (403, 404, 500, 502 →/error.html). - Missing
methods→ defaults to GET, POST, DELETE. - Locations without
root→ fallback to/var/www(unless redirected). - Root location (
/) withoutindex→ defaults to index.html.
- Missing
-
Normalization guarantees that later validation and runtime logic operate on a complete and uniform model.
-
Component:
validateConfig -
Goal: Enforce semantic correctness beyond syntax.
-
Checks applied:
- Presence checks: At least one
locationperserver. - Path rules: Location paths must start with
/and not contain segments beginning with.. - Defaults: Each location must define either a
rootorreturn(but not both with CGI). - Server names: Must be unique per host:port, valid per RFC 1035 (no spaces, no control chars, no empty labels).
- Ports: Only one unnamed default server per host:port pair.
- Error pages: Codes restricted to 400–599.
- Redirects: Only 301, 302, 303, 307, 308 allowed.
- Methods: Only
GET,POST,DELETEpermitted. - Client body size: Must be > 0.
- CGI: Extensions must start with a dot, interpreters must map 1‑to‑1 with declared extensions.
- Roots & Upload stores: Must exist and be directories.
- Index: Requires a valid
root.
- Presence checks: At least one
- Tokenizer: Throws
TokenizerErrorwith line/column context when encountering invalid tokens. - Parser: Throws
ConfigParseErroron invalid structure or misplaced directives. - Validator: Throws
ValidationErrorwith descriptive guidance on fixing invalid configurations.
The heart of Webserv’s I/O: a single poll() loop multiplexing listening sockets, client sockets, and CGI pipes, with strict timeouts and robust error recovery.
See Details
-
Listening sockets: set up
bind()/listen()for each configured host:port. -
Event loop: run non-blocking
poll()to monitor all descriptors. -
Connections:
- New connections →
accept()→ initialize per-client state. - Reads → receive → parse (supports pipelining) → route.
- CGI → spawn, monitor pipes, enforce timeouts, finalize.
- Writes → stream raw or file-backed responses with keep-alive and backpressure.
- New connections →
-
Timeouts: enforce idle, header, body, and send deadlines.
-
Errors: generate accurate HTTP error responses, close cleanly.
This section explains how Webserv processes HTTP/1.1 requests end‑to‑end, from bytes on a socket to fully formed responses, and how the server enforces protocol rules, timeouts, and connection reuse.
See Details
-
Accept & Read
SocketManageraccepts client connections on non‑blocking sockets and collects incoming bytes. Per‑connection state tracks read deadlines (header/body) and keep‑alive. -
Parse
HttpRequestParserincrementally parses:- Start line: method, request‑target (absolute‑path + optional query), HTTP version (HTTP/1.1).
- Headers: canonicalizes keys; enforces size limits and folding rules; detects
Connection,Host,Content-Length,Transfer-Encoding, etc. - Body: supports
Content-Lengthand chunked transfer decoding. Body size is capped byclient_max_body_size.
-
Route
requestRouterselects aServer(host+SNI/server_name) and the most specificLocation(longest URI prefix match). It normalizes the filesystem target path and determines whether the request hits static content, autoindex, redirect, upload, or CGI. -
Dispatch Based on method and location rules, it calls
handleGet,handlePost, orhandleDelete. Unsupported or disallowed → 405 withAllowheader. -
Build Response
responseBuilderproduces status line, headers, and body. It- Sets
Content-Type(MIME by extension),Content-LengthorTransfer-Encoding: chunked,Connection(keep‑alive vs close), and error pages. - Streams file bodies (sendfile/read+write) with backpressure; can fall back to buffered I/O for CGI and dynamic content.
- Sets
-
Send & Reuse
SocketManagerwrites the response, respecting write timeouts and TCP backpressure. IfConnection: keep-aliveand protocol rules allow, the connection stays open for subsequent pipelined requests.
- Static files: Path is resolved from
root+ URI, protecting against traversal. If an index is configured and exists for a directory, it is served. - Autoindex: When enabled and no index present,
generateAutoindexrenders a minimal HTML directory listing. - ETag/Last‑Modified (optional): If enabled, responses include validators; otherwise strong caching is avoided. Range requests are not served unless explicitly implemented.
- 400 malformed request, 413 body too large, 414 URI too long, 404/403 missing or forbidden paths, 405 method not allowed.
- 408/504 on header/body/send timeouts. 431 for oversized header sections.
- 5xx on internal faults, filesystem errors, or CGI failures (see below).
This section details how POST uploads, multipart forms, and CGI programs are handled, including sandboxing and timeout policy.
See Details
- Content dispatch:
handlePostinspectsContent-Typeand forwards to specialized handlers. - application/x-www-form-urlencoded: Parsed into key/value pairs. Small payloads are buffered; oversized inputs fail fast with 413.
- multipart/form-data:
handleMultipartFormparses parts lazily to disk, honoring per‑file and aggregate size limits. Saved files go to theupload_storedefined on the matchedLocation. - application/octet-stream / arbitrary media: Stored as a single file in
upload_storewith a server‑generated filename when no name is provided. - Overwrite policy: Configurable (e.g., reject on conflict or rename). Errors yield 409 (conflict) or 500 depending on the cause.
-
When CGI triggers: A request is routed to CGI when the target path matches a configured
cgi_extension(e.g.,.py,.php) and an interpreter is set, or when theLocationforces CGI. -
Environment:
handleCgiconstructs a POSIX environment per CGI/1.1:REQUEST_METHOD,QUERY_STRING,CONTENT_LENGTH,CONTENT_TYPE,SCRIPT_FILENAME,PATH_INFO,SERVER_PROTOCOL,SERVER_NAME,SERVER_PORT,REMOTE_ADDR, andHTTP_*for forwarded headers.- Working directory is the script directory; stdin is the request body (streamed or buffered based on size).
-
Process lifecycle:
- Create pipes for stdin/stdout, fork, exec interpreter + script.
- Parent polls child pipes non‑blocking with CPU/IO activity watchdogs.
- Enforces hard timeouts (startup, read, total runtime). On violation → terminate child.
-
Output parsing: CGI writes
Status: 200 OK\r\n, arbitrary headers, blank line, then body. The server:- Parses CGI headers (maps/filters hop‑by‑hop), merges with server headers.
- If
Locationheader without body → treat as redirect per CGI spec. - Otherwise body is streamed back to the client.
-
Failure mapping:
- Exec/spawn error → 502 Bad Gateway.
- Timeout or premature exit → 504 Gateway Timeout.
- Malformed CGI headers → 502.
- Script wrote nothing (unexpected EOF) → 502.
-
Security & Limits:
- Drop privileges/chroot (if configured); never inherit ambient FDs; sanitize environment.
- Enforce max body size, max headers, max response size (protects RAM), and per‑request open‑file caps.
- GET: Serves static files, autoindex pages, or dispatches to CGI. Conditional GETs (If‑Modified‑Since/If‑None‑Match) may be supported depending on build settings.
- DELETE: Removes targeted file from the resolved root when allowed in
methods. On success → 204 No Content; on missing/forbidden → 404/403.
- Centralizes status line + headers, error page selection, and body streaming. Ensures
Content-Lengthvschunkedconsistency and keeps connection semantics correct across errors and CGI boundaries.
This is the complete lifecycle from configuration to bytes on the wire, aligned with the current codebase.
See Details
- Startup & Configuration
-
Tokenizer → ConfigParser → normalizeConfig → validateConfig
- Tokenize config, build
Server/Locationgraphs, apply defaults (client body size, methods, roots, index, error pages), and enforce semantic rules (paths, redirects, methods, CGI mapping).
- Tokenize config, build
-
Bootstrap
- Instantiate
Serverobjects, bind/listen on configured host:port pairs, pre‑compute route tables and error pages.
- Instantiate
- Event Loop (SocketManager)
- Single non‑blocking
poll()loop over listening sockets, client sockets, and CGI pipes. - Per‑connection state tracks read/write buffers, deadlines (header/body/send), and keep‑alive.
- Accept new connections ➜ initialize state.
- Read → Parse (HttpRequestParser)
-
Accumulate bytes until
"\r\n\r\n"(header terminator) is found. -
Start line: validate method token, request‑target, version.
-
Headers: normalize keys, reject duplicates where disallowed, check
Content‑Length/Transfer‑Encoding(conflict, format), enforceHoston HTTP/1.1, cap header section size. -
URL/Host routing hint: derive effective
Urland matched server affinity; storeHost,Query,Content‑Length. -
Body:
- If
Transfer‑Encoding: chunked➜ incremental chunk decoding; forbid trailers; enforceclient_max_body_size. - Else if
Content‑Length➜ wait until full body; enforce size cap; detect pipelined next request beyond the declared length. - GET/DELETE: treat any extra bytes as pipeline, not body.
- If
- Routing (requestRouter)
-
Directory‑slash redirect when target resolves to a directory but URI lacks trailing
/. -
Location selection: exact match, else longest prefix.
-
Configured redirect (
return301/302/307/308) short‑circuit. -
Method gate:
- 501 if method not implemented (only GET/POST/DELETE supported).
- 405 if not allowed by
Location’smethods.
- Dispatch (methodsHandler)
-
GET
-
Resolve physical path under
root(no traversal, no symlinks). -
If directory:
- If index exists ➜ serve file.
- Else if
autoindex on➜ generate HTML listing. - Else ➜ 403.
-
If regular file ➜ serve with MIME type detection. Small files buffered, large files streamed.
-
-
POST
-
Preconditions: non‑empty body, size ≤
client_max_body_size,upload_storeconfigured. -
Determine safe target path under
upload_store(percent‑decode, canonicalize, reject symlinks, mkdir ‑p). -
Content‑type switch:
multipart/form-data➜ stream first file part to disk (boundary parsing, per‑part size cap).application/x‑www‑form‑urlencoded➜ parse kv pairs; persist rendered HTML summary.- Other types ➜ raw body saved as a file.
-
201 on success with minimal HTML confirmation.
-
-
DELETE
- Resolve path; reject directories/symlinks; remove regular file; reply 200 with HTML confirmation.
- CGI (handleCgi) - when location/extension triggers
-
Spawn
- Write request body to temp file; create output temp file.
- Build
execveargv (interpreter + script) and CGI/1.1 env (REQUEST_METHOD,QUERY_STRING,SCRIPT_FILENAME,PATH_INFO,SERVER_*,HTTP_*, etc.). fork()child ➜dup2(stdin/out)to temp fds ➜chdir(script dir)➜execve().
-
Supervision
- Parent polls pipes/Fds with timeouts; on inactivity/overrun ➜ kill and 504/502.
-
Finalize
- Parse output file head for CGI headers (
Status:,Content‑Type:) untilCRLF CRLF. - Compute body offset and size, then return a file‑backed response pointing at CGI output (no copy), with correct status and content type.
- Ensure temp files are unlinked/cleaned after send.
- Parse output file head for CGI headers (
- Response Building (responseBuilder/HttpResponse)
-
Build status line + headers; choose reason phrase; select custom error page if configured.
-
Set
Content‑Type,Content‑Length(or stream file length) and connection semantics. -
Keep‑alive policy
- HTTP/1.1: keep‑alive by default unless
Connection: closeor fatal status (e.g., 400/408/413/500) forces close. - HTTP/1.0: close by default unless
Connection: keep‑alive.
- HTTP/1.1: keep‑alive by default unless
-
For redirects: set
Location; body often omitted/minimal.
- Write → Reuse/Close
- Non‑blocking writes honor backpressure and send timeouts.
- If
keep‑aliveand no close‑forcing status ➜ retain connection for next pipelined request (parser resumes at leftover bytes). - Else ➜ close socket and release all per‑connection resources.
- Error Mapping & Hardening
- Parser/Router/FS/CGI errors mapped to precise HTTP codes (400/403/404/405/408/411/413/414/415/431/500/501/502/504/505).
- Safeguards: normalized paths, no
.., symlink denial, header/body caps, per‑request timeouts, upload store confinement, and strict header validation.
This project leverages GitHub Actions to ensure code quality, stability, and up-to-date documentation.
See Details
- Runs automatically on pushes and pull requests to
mainanddev. - Includes manual triggers (
workflow_dispatch) and dependency checks after successful builds.
Jobs Overview:
| Job | Description |
|---|---|
| 🔨 Build | Compiles the project using the provided Makefile to ensure successful builds. |
| 🧪 Test | Builds the server, runs Python test suite against a live instance, and captures logs on failure. |
| 📚 Docs | Generates Doxygen documentation (with Graphviz diagrams) and deploys it to GitHub Pages. |
Every code change is built, tested, and documented automatically, ensuring a robust development workflow and always-available reference docs.
This section describes how project documentation is generated, structured, and published.
See Details
- Documentation is generated automatically from source code comments and Markdown files.
README.mdserves as the entry point, offering an overview and links to modules.
-
Graphviz integration produces:
- Class diagrams to illustrate object hierarchies.
- Call graphs to visualize execution flow.
- Dependency graphs to map relationships between modules.
-
These visuals improve comprehension of the server’s architecture.
- The source browser cross-references functions, classes, and files.
- Each documented entity links directly to its definition in the codebase.
- Groups (
@defgroup,@ingroup) provide thematic navigation across modules (e.g.,config,core,http).
- Documentation is built in CI/CD pipelines.
- Published automatically via GitHub Pages from the
docs/htmldirectory. - Ensures the latest version is always available for contributors and maintainers.
- Consistent Doxygen-style headers across
.hppand.cppfiles. - Markdown files complement code documentation with high-level design notes and workflow explanations.
- Together, these guarantee both low-level API reference and high-level architectural guidance.
webserv
├── 📁 .github/ # GitHub Actions CI workflows and PR/issue templates
│ └── workflows/
│ ├── ci.yml # CI workflow: builds with Makefile
│ └── docs.yml # Doxygen documentation generation & GitHub Pages deploy
├── 📁 include/ # All public project headers, grouped by module (config, http, core, etc.)
├── 📁 src/ # Source files, mirrors the include/ structure
├── 📁 test_webserv/ # Unit tests
├── 📁 configs/ # Default config file
├── 📁 docs/ # Documentation generated by doxygen
├── .clang-format # Enforces formatting rules (4-space indent, K&R braces, etc.)
├── .editorconfig # Shared IDE/editor config for consistent style
├── .gitattributes # Defines merge/diff rules for Git (e.g. binary files)
├── .gitignore # Files and folders ignored by Git (e.g. build/, *.o)
├── ACTIONPLAN.md # Project-level planning/roadmap
├── DOXYGENSTYLEGUIDE.md # Doxygen conventions for documenting code
├── Doxyfile # Main config for Doxygen documentation generation
├── LICENSE # Project license
├── Makefile # Build system entry point
├── README.md # Main README
├── STYLEGUIDE.md # Coding conventions for naming, layout, formatting
├── run_test.py # Entrypoint for python tests
├── webserv.subject.pdf # Original subject specification for the projectmake
./bin/webserv <path-to-config.conf>The default goal is
all. The binary is produced atbin/webserv.
| Command | Description |
|---|---|
make |
Build the project in release mode (C++20, -O3 -flto -DNDEBUG -march=native). |
make re |
Clean everything and rebuild from scratch. |
make clean |
Remove object files and dependency files in objs/. |
make fclean |
Remove the executable, bin/, and all build artifacts (also runs clean). |
make install_test_deps |
Create a local Python venv in .venv/ and install requirements-test.txt. |
make test |
Build, start the server in background with ./test_webserv/tester/config/tester.conf, run run_test.py, then stop the server. |
make format |
Run clang-format -i on all listed sources and headers. |
make help |
Print a categorized list of available targets. |
- Objects and auto-generated deps are stored under
objs/(built via-MMD -MP). - The build uses explicit source lists (no wildcards) for deterministic builds.
- The test rule writes the PID to
.webserv_test.pidand cleans it up on success/failure. - Ensure
python3-venvandclang-formatare installed on your system.
This project is licensed under the terms of the MIT License.