Skip to content

parse_html ignoring white-spaces and newlines for <pre><code> ... </pre></code> html #107

@qknight

Description

@qknight

When using parse_html() it seems that <pre><code> sections are only parsed correctly when no nested tag(s) are used but instead only text nodes. as soon as a nested html element as a <span>...</span> is used, it looses the formatting as spaces and newlines (probably tabs, too).

After checking that the parser works according to html specification at fefit/rphtml#4 I think that the error I'm seeing comes from process_node(...) now.

I had added these to html_parser_tests.rs

After a bunch of tests I discoverd:

  • when the first element in the <pre><code> is a string, formatting works correctly.
  • If the next element is a

    2

    (so any tag) it still works.
  • However, if the next one afterwards is a tag like

    3

    again, it fails to indent
  • but works with a text node. So the tags parser introduces a state somehow.

test 1

#[test]
fn test_pre_code() {
    let html = r#"<div><p> test </p>
<pre><code>
0
  1
  <p>foo</p>
  2
3</code></pre>
</div>"#;
let expected = r#"<div><p> test </p><pre><code>
0
  1
  <p>foo</p>
  2
3</code></pre></div>"#;
    let node: Node<()> = parse_html(html).ok().flatten().expect("must parse");
    //println!("node: {:#?}", node);
    println!("html: {}", html);
    println!("render: {}", node.render_to_string());
    assert_eq!(expected, node.render_to_string());
}

result:

cargo test -p sauron --test html_parser_test
test test_pre_code ... ok

test 2

#[test]
fn test_pre_code_2() {
    let html = r#"<pre><code>
0
<span>asdf</span>
  <span>asdf</span>
  <span>asdf</span>
</code></pre>"#;
let expected = r#"<pre><code>
<span>asdf</span>
  <span>asdf</span>
  <span>asdf</span>
</code></pre>"#;

    let node: Node<()> = parse_html(html).ok().flatten().expect("must parse");
    //println!("node: {:#?}", node);
    println!("html: {}", html);
    println!("render: {}", node.render_to_string());
    assert_eq!(expected, node.render_to_string());
}

result

cargo test -p sauron --test html_parser_test
test test_pre_code2 ... FAILED

failures:

---- test_pre_code_2 stdout ----
html: <pre><code>
0
<span>asdf</span>
  <span>asdf</span>
  <span>asdf</span>
</code></pre>
render: <pre><code>
0
<span>asdf</span><span>asdf</span><span>asdf</span></code></pre>
thread 'test_pre_code_2' panicked at tests/html_parser_test.rs:97:5:
assertion `left == right` failed
  left: "<pre><code>\n<span>asdf</span>\n  <span>asdf</span>\n  <span>asdf</span>\n</code></pre>"
 right: "<pre><code>\n0\n<span>asdf</span><span>asdf</span><span>asdf</span></code></pre>"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

test 3

#[test]
fn test_pre_code3() {
    let html = r#"<div><p> test </p><pre><code>
0
  1
  2
3
</code></pre>
</div>"#;
let expected = r#"<div><p> test </p><pre><code>
0
  1
  2
3
</code></pre></div>"#;

    let node: Node<()> = parse_html(html).ok().flatten().expect("must parse");
    //println!("node: {:#?}", node);
    println!("html: {}", html);
    println!("render: {}", node.render_to_string());
    assert_eq!(expected, node.render_to_string());
}

result

cargo test -p sauron --test html_parser_test
test test_pre_code3 ... ok

test 4

#[test]
fn test_pre_code3_paragraphs_mix() {
    let html = r#"<div><p> test </p><pre><code>
  0
  <p>1</p>
  2
<p>3</p>
  4
</code></pre>
</div>"#;
let expected = r#"<div><p> test </p><pre><code>
  0
  <p>1</p>
  2
<p>3</p>
  4
</code></pre></div>"#;

    let node: Node<()> = parse_html(html).ok().flatten().expect("must parse");
    //println!("node: {:#?}", node);
    println!("html: {}", html);
    println!("render: {}", node.render_to_string());
    assert_eq!(expected, node.render_to_string());
    // right "<div><p> test </p><pre><code>\n  0\n  <p>1</p><p>2</p><p>3</p></code></pre></div>"
}

result

test test_pre_code3_paragraphs_mix ... ok

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions