Skip to content

refactor: split GitLab parser into focused modules#15

Merged
PI-Victor merged 5 commits intomainfrom
feat/CLO-139
Apr 7, 2026
Merged

refactor: split GitLab parser into focused modules#15
PI-Victor merged 5 commits intomainfrom
feat/CLO-139

Conversation

@interstella5555
Copy link
Copy Markdown
Contributor

Summary

  • split the GitLab parser entry point into focused include resolution, normalization, job parsing, and test modules
  • move include loading and include:project fetching behind a dedicated IncludeResolver
  • keep YAML merge / !reference normalization separate from graph and job parsing logic

Why

The old crates/opal/src/gitlab/parser.rs mixed file loading, include traversal, normalization, graph building, and tests in one monolithic module. This refactor reduces coupling without changing the supported GitLab behavior.

Closes CLO-139
Linear: https://linear.app/cloudflavor/issue/CLO-139/split-gitlab-parser-into-include-normalization-and-job-parsing-modules

Validation

  • cargo fmt --all
  • cargo test -p opal gitlab::parser --lib
  • cargo check -p opal
  • cargo test -p opal --lib
  • cargo clippy -p opal --all-targets -- -D warnings
  • Opal MCP rust-checks slice: Success (gitlab-ci-f7175817)

@cloudflare-workers-and-pages
Copy link
Copy Markdown
Contributor

cloudflare-workers-and-pages bot commented Apr 7, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
opal a11b181 Apr 07 2026, 07:34 PM

Split the GitLab parser entry point into include resolution, normalization, job parsing, and dedicated tests without changing supported behavior.
let project_id = percent_encode(project);
let file_id = percent_encode(&relative.to_string_lossy());
let ref_id = percent_encode(reference);
let url =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@interstella5555 remove the curl command. add a proper way to execute http requests using a library

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current code is up to date here: the include fetch path now uses reqwest::Client in IncludeIo::fetch_project_file, so the shell curl path is gone.

stack.push(path.to_path_buf());

let content =
fs::read_to_string(path).with_context(|| format!("failed to read {:?}", path))?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@interstella5555 separate fs IO from this function such that it can be tested.
also, this does not use async

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current code is up to date here: filesystem and remote IO are now split behind IncludeIo, and the resolver uses async tokio::fs helpers instead of synchronous file access.

@@ -0,0 +1,430 @@
use crate::{GitLabRemoteConfig, env, runtime};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@interstella5555 this file does not use async, refactor it to use async. do not use spawn_block or spawn unless you have a good reason to.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: include_resolver.rs is now async end-to-end, with async filesystem access through tokio::fs and remote fetching through reqwest::Client, without adding spawn or spawn_blocking.

Replace the include resolver curl subprocess with async reqwest, move resolver I/O behind a dedicated helper, and thread async pipeline loading through the parser and plan call sites.
@cloudflavor cloudflavor deleted a comment from PI-Victor Apr 7, 2026
@cloudflavor cloudflavor deleted a comment from PI-Victor Apr 7, 2026
@cloudflavor cloudflavor deleted a comment from PI-Victor Apr 7, 2026
}

async fn canonicalize(&self, path: &Path) -> Result<PathBuf> {
tokio::fs::canonicalize(path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use proper imports

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: I switched this module to top-level imports, including tokio::fs and the aliased serde_yaml imports, instead of repeating module-qualified calls.

@@ -88,7 +88,8 @@ impl ExecutorCore {
// TODO: this shit does way too much, hard to test if you add fs::create inside of it
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@interstella5555 did you do address the concerns in this todo?
this function mixes a lot of concerns

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 5603857: I pulled the directory setup and executor env construction out of ExecutorCore::new into prepare_executor_directories[_at] and build_executor_env[_from], and added direct unit coverage for those extracted paths.


for (key, value) in mapping {
match key {
Value::String(name) if name == "before_script" => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be an enum, then the conversion should be much easier to do and more ergonomic. also, this should be easy to extend, because this represents the features of the gitlab pipeline

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: the parser dispatch here is now typed with enums (PipelineKeyword, DefaultKeyword, JobSectionKey) rather than raw string matching, so extending supported GitLab keys is more direct.


use super::merge_mappings;

type ResolverFuture<'a, T> = Pin<Box<dyn Future<Output = Result<T>> + Send + 'a>>;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why's this necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra indirection is gone in 69737ad; I simplified the code path and kept the direct form instead of carrying the additional helper/alias forward.

}

async fn try_exists(&self, path: &Path) -> Result<bool> {
tokio::fs::try_exists(path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jesus, use tokio import up top for all of the below cases

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: the repeated tokio::fs::... calls were collapsed to a single top-level use tokio::fs; import and fs::... call sites.

let response = self
.client
.get(&url)
.header("PRIVATE-TOKEN", gitlab.token.as_str())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use context7, find out what the correct path is for this and how to set the token.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: I verified the GitLab repository file raw endpoint and PRIVATE-TOKEN header via Context7, then moved that into gitlab_repository_file_raw_url(...) plus GITLAB_PRIVATE_TOKEN_HEADER so the contract is explicit in code.

return Ok(None);
};
let rules: Vec<JobRule> =
serde_yaml::from_value(rules_value.clone()).context("failed to parse workflow.rules")?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use top imports

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: this module now uses top-level aliased imports for the repeated serde_yaml calls (yaml_from_str, yaml_from_value).

fn parse_job(value: Value) -> Result<ParsedJobSpec> {
match value {
Value::Mapping(mut map) => {
let image_value = map.remove(Value::String("image".to_string()));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about enums

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: this path now uses typed enums for the relevant parser states instead of raw string matching, so extending the supported GitLab keys stays localized.

}
}

fn build_graph(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break this function down, separate the logic into smaller ones, it does too many things

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: build_graph is now just the coordinator; the work is split across GraphBuilder, build_job, and the smaller resolve_* helpers so the previous single large function is gone.

inherit.interruptible = *value;
}
RawInheritDefault::List(entries) => {
inherit.image = entries.iter().any(|entry| entry == "image");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would benefit from enum

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: this branch was converted to enum-backed parsing as part of the parser cleanup, so it no longer relies on the previous stringly dispatch.

Ok(())
}

fn parse_artifact_when(value: Option<&str>, job_name: &str) -> Result<ArtifactWhen> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make an enum out of states as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 69737ad: the state parsing here is now typed through ArtifactWhenKeyword before converting to ArtifactWhen.

@PI-Victor PI-Victor marked this pull request as ready for review April 7, 2026 19:15
use tracing::warn;

use super::super::{
graph::{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatten this import

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a11b181: flattened this import so graph and rules are imported separately at the top level instead of through the nested grouped import.

@PI-Victor PI-Victor merged commit ae21234 into main Apr 7, 2026
1 check passed
@PI-Victor PI-Victor deleted the feat/CLO-139 branch April 7, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants