-
-
Notifications
You must be signed in to change notification settings - Fork 60
parse_html ignoring white-spaces and newlines for <pre><code> ... </pre></code> html #107
Copy link
Copy link
Open
Description
When using parse_html() it seems that <pre><code> sections are only parsed correctly when no nested tag(s) are used but instead only text nodes. as soon as a nested html element as a <span>...</span> is used, it looses the formatting as spaces and newlines (probably tabs, too).
After checking that the parser works according to html specification at fefit/rphtml#4 I think that the error I'm seeing comes from process_node(...) now.
I had added these to html_parser_tests.rs
After a bunch of tests I discoverd:
- when the first element in the
<pre><code>is a string, formatting works correctly. - If the next element is a
2
(so any tag) it still works. - However, if the next one afterwards is a tag like
3
again, it fails to indent - but works with a text node. So the tags parser introduces a state somehow.
test 1
#[test]
fn test_pre_code() {
let html = r#"<div><p> test </p>
<pre><code>
0
1
<p>foo</p>
2
3</code></pre>
</div>"#;
let expected = r#"<div><p> test </p><pre><code>
0
1
<p>foo</p>
2
3</code></pre></div>"#;
let node: Node<()> = parse_html(html).ok().flatten().expect("must parse");
//println!("node: {:#?}", node);
println!("html: {}", html);
println!("render: {}", node.render_to_string());
assert_eq!(expected, node.render_to_string());
}result:
cargo test -p sauron --test html_parser_test
test test_pre_code ... ok
test 2
#[test]
fn test_pre_code_2() {
let html = r#"<pre><code>
0
<span>asdf</span>
<span>asdf</span>
<span>asdf</span>
</code></pre>"#;
let expected = r#"<pre><code>
<span>asdf</span>
<span>asdf</span>
<span>asdf</span>
</code></pre>"#;
let node: Node<()> = parse_html(html).ok().flatten().expect("must parse");
//println!("node: {:#?}", node);
println!("html: {}", html);
println!("render: {}", node.render_to_string());
assert_eq!(expected, node.render_to_string());
}result
cargo test -p sauron --test html_parser_test
test test_pre_code2 ... FAILED
failures:
---- test_pre_code_2 stdout ----
html: <pre><code>
0
<span>asdf</span>
<span>asdf</span>
<span>asdf</span>
</code></pre>
render: <pre><code>
0
<span>asdf</span><span>asdf</span><span>asdf</span></code></pre>
thread 'test_pre_code_2' panicked at tests/html_parser_test.rs:97:5:
assertion `left == right` failed
left: "<pre><code>\n<span>asdf</span>\n <span>asdf</span>\n <span>asdf</span>\n</code></pre>"
right: "<pre><code>\n0\n<span>asdf</span><span>asdf</span><span>asdf</span></code></pre>"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test 3
#[test]
fn test_pre_code3() {
let html = r#"<div><p> test </p><pre><code>
0
1
2
3
</code></pre>
</div>"#;
let expected = r#"<div><p> test </p><pre><code>
0
1
2
3
</code></pre></div>"#;
let node: Node<()> = parse_html(html).ok().flatten().expect("must parse");
//println!("node: {:#?}", node);
println!("html: {}", html);
println!("render: {}", node.render_to_string());
assert_eq!(expected, node.render_to_string());
}result
cargo test -p sauron --test html_parser_test
test test_pre_code3 ... ok
test 4
#[test]
fn test_pre_code3_paragraphs_mix() {
let html = r#"<div><p> test </p><pre><code>
0
<p>1</p>
2
<p>3</p>
4
</code></pre>
</div>"#;
let expected = r#"<div><p> test </p><pre><code>
0
<p>1</p>
2
<p>3</p>
4
</code></pre></div>"#;
let node: Node<()> = parse_html(html).ok().flatten().expect("must parse");
//println!("node: {:#?}", node);
println!("html: {}", html);
println!("render: {}", node.render_to_string());
assert_eq!(expected, node.render_to_string());
// right "<div><p> test </p><pre><code>\n 0\n <p>1</p><p>2</p><p>3</p></code></pre></div>"
}result
test test_pre_code3_paragraphs_mix ... ok
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels