Skip to content

Data miscompares #71

@mcfadden8

Description

@mcfadden8

This issue is being reproduced with the bugfix/data-miscompare branch of umap.

logfile.gz was generated by running the umapsort as follows:

$ time umapsort -p 100000 -b 80000 -f /mnt/intel/sortfile -t 128 -u 1 >& /tmp/logfile
$ gzip /tmp/logfile

The machine configuration that this problem was reproduced on is:

$ uname -a
Linux behemoth-rhel7 4.20.0-rc6uffd-wp-merged-01-180264-g9bc88be70eb1 #1 SMP Fri Dec 21 09:09:08 PST 2018 x86_64 x86_64 x86_64 GNU/Linux
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0-79
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             4
NUMA node(s):          4
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 47
Model name:            Intel(R) Xeon(R) CPU E7- 4850  @ 2.00GHz
Stepping:              2
CPU MHz:               1063.952
BogoMIPS:              3989.89
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              24576K
NUMA node0 CPU(s):     0-9,40-49
NUMA node1 CPU(s):     10-19,50-59
NUMA node2 CPU(s):     20-29,60-69
NUMA node3 CPU(s):     30-39,70-79

Line number 698409 0x7f007d96f000 0 NOT Present { 0 READ } shows that we received a read fault notification for page 0xf007d96f000 and that it currently is not present. The umap buffer was full and it needed to evict the page indicated by line number 698410 0x7f008b7af000 0x7f00731de9b0 EVICT. The last activity logged for this particular page was back at line number 358812 0x7f008b7af000 0 NOT Present { 0 READ } where it was copied in.

Instrumentation was added to the eviction code to compare the SHA1 of the current page in memory with the SHA1 that was taken when the page was originally read from the backing store. This instrumentation will log a message as seen on line 698411 0x7f008b7af000 0x7f00731de9b0 Dirty page found that was not previously marked dirty! when it is noticed that the page is different and it wasn't marked as dirty (which happens when we get a WP and/or WRITE message from UFFD for that page.

Note, there were no other events associated with this page, but our SHA1 instrumentation indicated that the page had changed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions