Skip to content

Conversation

@mattpitkin
Copy link
Collaborator

This PR attempts to better match the ordering of members of the observation and pulsar structures within tempo2.h, in case there are memory mismatch issues.

@mattpitkin
Copy link
Collaborator Author

mattpitkin commented Apr 28, 2025

@vallis @vhaasteren @jellis18 @stevertaylor - sorry for spam tagging you, but I'd appreciate if someone else took a look at this PR. I'm trying to fix the occasional error see in, e.g., #49, which I thought might be down to structure layout mismatches. So, I've tried to rearrange the definition of the pulsar and obsveration structures to better match those in tempo2.h.

Along with that, I've added the ability for the test suite to run on a range of tempo2 versions, which will hopefully catch more issues. Currently, not all tests pass, e.g., https://github.com/vallis/libstempo/actions/runs/14718363847/job/41306896014?pr=80#step:8:77, where it always seems to be this test line that fails (although it seems to do this quasi-randomly and often will pass if the test is re-run!), which I expect is still the same failure, i.e., a utf-8 decode issue, as in #49.

@mattpitkin
Copy link
Collaborator Author

Note that (currently) all jobs pass, but only after I manually re-ran the failed jobs. This shows that the failure is quasi-random. Probably down to whether a copied string contains some invalid memory or not. It would be useful to come up with a way to make the problem reproducible so that an actual solution can be found.

@vhaasteren
Copy link
Collaborator

Good job modifying the structs @mattpitkin . Those flagID and flagVal pointers you changed were definitely not right! Could the UTF-8 error have something to do with the way libstempo currently parses strings. It defines:

# what is the default encoding here?
string = lambda s: s.decode()
string_dtype = 'U'

That's at the top of the file of libstempo.pyx. I just asked ChatGPT that specific question, and it seems to suggest to change it:

def string(buf):
    # take bytes up to the first '\0'
    raw = bytes(buf).split(b'\0', 1)[0]
    # try UTF-8, else fall back to Latin-1 (one-to-one byte→codepoint)
    try:
        return raw.decode('utf-8')
    except UnicodeDecodeError:
        return raw.decode('latin-1')

I am not sure how that can create randomness though, but how would an encoding error be random? My first guess would have also been to double-check the memory layout like you have done. Unfortunately I am not familiar enough with the underlying workings to be of more help, sorry!

@mattpitkin
Copy link
Collaborator Author

@vhaasteren I think that might be part of the issue, so it's worth trying ChatGPT's suggestion. I think a bigger culprit is likely to be the use of sprintf to copy strings. Keeping with the theme of asking LLMs, yesterday I asked Gemini about the savepar function, and it suggested:

  1. Unsafe sprintf Usage (Highly Likely):

    The line stdio.sprintf(parFile, "%s", <char *>parfile_bytes) is problematic.

    The Cast: <char *>parfile_bytes performs a direct C-level cast of the Python bytes object (parfile_bytes) to a C char *.
    The Problem: A Python bytes object's internal buffer is not guaranteed to be null-terminated like a standard C string. The %s format specifier in sprintf expects its corresponding argument (the source string) to be a null-terminated char *.
    The Segfault: If parfile_bytes doesn't happen to have a null byte (\0) exactly where its content ends within its allocated buffer, sprintf will continue reading past the end of the actual parfile_bytes data, copying garbage from adjacent memory into parFile until it randomly encounters a null byte somewhere in memory or reads from an invalid memory address, causing a segmentation fault.
    Why Occasional? Whether it segfaults depends on the exact memory layout at runtime and whether sprintf happens to hit unmapped memory before finding a stray null byte. This makes the error intermittent.
    Length Check: Your length check if len(parfile_bytes) > MAX_FILELEN - 1: correctly prevents sprintf from writing beyond the end of the destination buffer (parFile), but it does not fix the problem of sprintf potentially reading beyond the end of the source data (parfile_bytes).

with the suggestion to replace the sprintf line with:

# Replace the sprintf line with this:
cdef const char* parfile_c_str = parfile_bytes # Cython creates a temp C string
# Use strcpy now that length is checked and source is guaranteed null-terminated
strcpy(parFile, parfile_c_str)

@vhaasteren
Copy link
Collaborator

Oh, that's a really good suggestion!

Overtall, the random crashes of tempo2/libstempo are a major headache, and it actually prevents people (including myself!) to use it. I have found that the segfaults have increased in frequency when running tempo2 on my M1 macbook, either natively or emulated under rosetta. Any adjustments that address these crashes are very welcome.

@mattpitkin mattpitkin changed the title libstempo.pyx: shuffle some observation/pulsar stricture contents libstempo.pyx: shuffle some observation/pulsar structure contents Apr 30, 2025
@mattpitkin mattpitkin changed the title libstempo.pyx: shuffle some observation/pulsar structure contents WIP: Attempt to fix quasi-random UTF-8 conversion failures/seg faults May 1, 2025
@vhaasteren
Copy link
Collaborator

So @mattpitkin, @vallis , while this WIP is a good avenue, I decided to go with 'the other' workaround in the meantime: sandboxing! For me that already is a big win. I put it in #81 , for now as a draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants